Your Guide to Unsupervised Learning of Depth and Ego-Motion from Video

Feb 13, 2021 | Data Science

Welcome, fellow AI enthusiasts! Today, we’re diving into an exciting project that explores how to create depth maps and ego-motion estimates from video using unsupervised learning. This codebase, known as SfMLearner, implements the system described in the groundbreaking paper by Tinghui Zhou and his colleagues, presented at CVPR 2017. Let’s unravel the intricacies and get you started!

Prerequisites

Before we dive into the code, make sure you have the following setup:

  • Tensorflow 1.0
  • CUDA 8.0
  • Ubuntu 16.04

Running the Single-View Depth Demo

To run the demo of the single-view depth prediction model, follow these steps:

  1. Download the pre-trained model from this Google Drive and place the model files under the models directory.
  2. Utilize the provided Jupyter Notebook demo.ipynb to execute the demo.

Preparing Training Data

To prepare your training data correctly, you need to format it before feeding it into the model. Here’s how:

KITTI Dataset

Start by downloading the KITTI dataset using this script. Once downloaded, run the following command:

bash python dataprepare_train_data.py --dataset_dir=path_to_raw_kitti_dataset --dataset_name=kitti_raw_eigen --dump_root=path_to_resulting_formatted_data --seq_length=3 --img_width=416 --img_height=128 --num_threads=4

For pose experiments, download the KITTI odometry split from here and change the --dataset_name option to kitti_odom while preparing the data.

Cityscapes Dataset

For Cityscapes, download the following packages:

  1. leftImg8bit_sequence_trainvaltest.zip
  2. camera_trainvaltest.zip

After downloading, run this command:

bash python dataprepare_train_data.py --dataset_dir=path_to_cityscapes_dataset --dataset_name=cityscapes --dump_root=path_to_resulting_formatted_data --seq_length=3 --img_width=416 --img_height=171 --num_threads=4

Keep in mind that the height for Cityscapes is set to 171 because we crop the bottom part of the image containing the car logo.

Training the Model

Once your data is formatted, you can commence training the model. Execute the following command:

bash python train.py --dataset_dir=path_to_the_formatted_data --checkpoint_dir=where_to_store_checkpoints --img_width=416 --img_height=128 --batch_size=4

To visualize your training progress, start a TensorBoard session:

bash tensorboard --logdir=path_to_tensorflow_log_files --port=8888

Open your browser and navigate to localhost:8888. After approximately 100K iterations, you should start seeing promising depth predictions when training on the KITTI dataset.

Evaluation

To evaluate your trained model, you’ll need to follow a particular process detailed below.

Depth Evaluation on KITTI

Download our predictions from this Google Drive and place them into kitti_eval. Run the command:

bash python kitti_eval/eval_depth.py --kitti_dir=path_to_raw_kitti_dataset --pred_file=kitti_eval/kitti_eigen_depth_predictions.npy

If executed correctly, you’ll retrieve evaluation metrics reported in Table 1 of the paper.

Pose Estimation Evaluation

Download the predictions and ground-truth pose data from this Google Drive. Run the following command:

bash python kitti_eval/eval_pose.py --gtruth_dir=directory_of_ground_truth_trajectory_files --pred_dir=directory_of_predicted_trajectory_files

For instance, to evaluate results for Sequence 10, you’d run:

bash python kitti_eval/eval_pose.py --gtruth_dir=kitti_eval/pose_data/ground_truth/10 --pred_dir=kitti_eval/pose_data/ours_results/10

Testing on KITTI

Once the model is trained, obtain depth predictions for the KITTI eigen test split by executing:

bash python test_kitti_depth.py --dataset_dir path_to_raw_kitti_dataset --output_dir path_to_output_directory --ckpt_file path_to_pre-trained_model_file

For pose predictions, run:

bash python test_kitti_pose.py --test_seq [sequence_id] --dataset_dir path_to_KITTI_odometry_set --output_dir path_to_output_directory --ckpt_file path_to_pre-trained_model_file

Troubleshooting

While working with the SfMLearner codebase, you might encounter a few common challenges:

  • Data Formatting Errors: Ensure that the paths specified for data directories are accurate and accessible.
  • Dependency Issues: Double-check that all the prerequisites, such as Tensorflow and CUDA, are properly installed. A mismatch may cause runtime errors.
  • TensorBoard Not Displaying: Verify that the TensorBoard is pointed to the correct log directory and that you’re using the right port.
  • Pre-trained Model Issues: Make certain that you’ve placed the pre-trained model files in the correct models folder.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox