How to Perform 3D Object Localization in RGB-D Scans Using Natural Language Descriptions

Nov 24, 2020 | Data Science

In this article, we’ll explore the fascinating task of 3D object localization in RGB-D scans using natural language descriptions. Our main tool is ScanRefer, a powerful system that enables computers to identify objects in a scanned 3D scene based on free-form language descriptions. Whether you’re a seasoned researcher or a curious newcomer, this guide will walk you through the setup, usage, and troubleshooting of the ScanRefer system. Let’s dive in!

What is ScanRefer?

ScanRefer allows users to input a point cloud of a scanned 3D scene along with a description of a target object. It utilizes a fused descriptor that combines both geometric features of the scan and encoded linguistic information to identify and localize the object seamlessly. To do this, ScanRefer leverages a dataset compiled with 51,583 object descriptions across 11,046 objects sourced from 800 ScanNet scenes.

Setting Up ScanRefer

Step 1: Pre-requisites

  • Operating System: Tested on Ubuntu 16.04 LTS and 18.04 LTS.
  • Python Environment: PyTorch 1.6.0 and required packages listed in requirements.txt.

Step 2: Installing Dependencies

To install the necessary packages and libraries, follow these commands:

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Step 3: Downloading Datasets

Download the ScanRefer dataset and unzip it under data/. You will also need:

  • Preprocessed GLoVE embeddings from here
  • ScanNetV2 dataset, with scans located under datascan/scans

Step 4: Data Preparation

Run the following command to pre-process ScanNet data:

cd datascannet
python batch_load_scannet_data.py

After preprocessing, ensure the data is valid by running:

python visualize.py --scene_id scene0000_00

Training The Model

To train the ScanRefer model using RGB values, execute the following command:

python scripts/train.py --use_color

Evaluating The Model

To evaluate the trained ScanRefer model, use this command:

python scripts/eval.py --folder folder_name --reference --use_color --no_nms --force --repeat 5

Understanding the Code – An Analogy

Think of the ScanRefer system as a chef preparing a gourmet dish. The recipe represents the natural language description, while the ingredients are akin to the 3D point cloud and its geometric features. The chef must understand how to combine the recipe with the ingredients to create the final dish, which in this case is the accurately localized object.

In this analogy, the description serves as the guidance for identifying the right objects (ingredients) in the 3D scene (pantry) and fusing them together to create the perfect meal (3D bounding box around the object). The chef’s training (the model’s training) ensures that this process adheres to the best culinary standards (object localization accuracy).

Troubleshooting

  • Ensure you have the correct version of PyTorch installed; issues may arise with versions newer than 1.6.0.
  • If you encounter any errors while data preparation, double-check the dataset paths specified in config.py.
  • For further assistance or more insights, updates, or collaboration on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Conclusion

By following the steps outlined above, you should be well on your way to utilizing ScanRefer for 3D object localization using natural language descriptions. Whether you’re aiming to enhance your research or simply experiment with cutting-edge AI techniques, ScanRefer offers a robust framework for intricate tasks that bridge language and spatial understanding.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox