Abstract: Accurate perception of semantic and spatial information is crucial for robots performing language-driven navigation tasks. Existing approaches utilize visual-language models to extract ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results