Humans naturally rely on floor plans to navigate in unfamiliar environments, as they are readily available, reliable, and provide rich geometrical guidance. However, existing visual navigation settings overlook this valuable prior knowledge, leading to limited efficiency and accuracy. To eliminate this gap, we introduce a novel navigation task: Floor Plan Visual Navigation (FloNa), the first attempt to incorporate floor plan into embodied visual navigation. While the floor plan offers significant advantages, two key challenges emerge: (1) handling the spatial inconsistency between the floor plan and the actual scene layout for collision-free navigation, and (2) aligning observed images with the floor plan sketch despite their distinct modalities. To address these challenges, we propose FloDiff, a novel diffusion policy framework incorporating a localization module to facilitate alignment between the current observation and the floor plan. We further collect 20k navigation episodes across 117 scenes in the iGibson simulator to support the training and evaluation. Extensive experiments demonstrate the effectiveness and efficiency of our framework in unfamiliar scenes using floor plan knowledge.
We collect a large-scale dataset comprising 20,214 navigation episodes across 117 static indoor scenes from Gibson, resulting in a total of 3,312,480 images captured with a 45° field of view. We split the dataset into training and testing sets, which comprise 67 scenes and 50 scenes, respectively. For each scene, we provide a floor plan, a navigable map, and several collision-free navigation episodes. Based on scene size, we collect 150 episodes in small scenes, 180 in medium scenes, and 200 in large scenes. The episode trajectory length ranges from 4.53 to 42.03 meters. The dataset can be downloaded here.
(a) Loc-FloDiff relies on estimated pose (b) Naive-FloDiff estimates pose itself
Loc-FloDiff achieves the best performance, demonstrating its planning and collision avoidance capabilities
Two videos of the real-world experiments
@inproceedings{li2025flona,
title={FloNa: Floor Plan Guided Embodied Visual Navigation},
author={Li, Jiaxin and Huang, Weiqi and Wang, Zan and Liang, Wei and Di, Huijun and Liu, Feng},
booktitle={AAAI Conference on Artificial Intelligence (AAAI)},
year={2025}
}