This tutorial aims to walk you through the exciting world of object pose estimation utilizing a UR3 robotic arm in Unity. You will learn how to integrate ROS with Unity, import URDF models, collect labeled training data, and train and deploy a deep learning model. By the end of this article, you should be able to perform pick-and-place actions with a robotic arm in Unity and use computer vision to perceive the object being manipulated.
Table of Contents
- Part 1: Create Unity Scene with Imported URDF
- Part 2: Setting up the Scene for Data Collection
- Part 3: Data Collection and Model Training
- Part 4: Pick-and-Place
Part 1: Create Unity Scene with Imported URDF
Welcome to the first step! Here you will download and install the Unity Editor. Once you have Unity up and running, you will set up a basic scene and import the UR3 robot arm using the URDF Importer package. Begin by downloading the UR3 robot arm model from Universal Robots website.
Part 2: Setting up the Scene for Data Collection
In this stage, we will configure the scene to facilitate data collection using the Unity Computer Vision Perception Package. You will discover how to implement Randomizers to introduce variability into your training data. If you wish to dive deeper into randomization techniques, explore our additional exercises here.
Part 3: Data Collection and Model Training
In this segment, you will run data collection utilizing the Perception Package and use the data to train a deep learning model. Note that the training process can be time-consuming, so if you prefer, you can skip all that work by using our pre-trained model. Here are the results we achieved from running 100 trials with our pre-trained model:
Success Failures Percent Success
Without occlusion 82 5 94
With occlusion 7 6 54
All 89 11 89
*Note: Data for the above experiment was collected in Unity 2020.2.1f1.
Part 4: Pick-and-Place
Finally, let’s put it all together! In this section, you will prepare and configure everything necessary to conduct a pick-and-place task using MoveIt. The trained deep learning model will predict the cube pose, and you’ll leverage a few essential techniques:
- Create and invoke a motion planning service in ROS
- Send captured RGB images to the ROS Pose Estimation node for inference
- Execute a Python script to perform inference using the trained deep learning model
- Adjust Unity Articulation Bodies based on the computed trajectory
- Utilize a gripping tool to successfully grasp and release objects
Troubleshooting Ideas
If you encounter issues during this tutorial, consider the following troubleshooting tips:
- Ensure that you have installed all necessary packages and dependencies, including Python 3 and ROS Noetic.
- Double-check that your Unity project settings are correct, particularly for physics and rendering.
- If you’re having problems with the URDF Importer, confirm that all model files are correctly formatted and located as expected.
- For any questions or discussions about Unity Robotics package installations or setups, please create a new thread on the Unity Robotics forum.
- For feature requests or bugs, file a GitHub issue.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

