How to Use TensorRT C++ API for High Performance GPU Inference

Mar 14, 2023 | Data Science

Welcome to your guide on utilizing the TensorRT C++ API for efficient GPU machine-learning inference! This article will walk you through the process of setting it up in an easy-to-understand manner, from installation to running inference with your model.

Getting Started

To get started with TensorRT on Ubuntu 20.04 or 22.04, you need to ensure you have all the necessary prerequisites in place. Below, we detail the step-by-step instructions to set everything up.

Prerequisites

Tested on Ubuntu 20.04 and 22.04; Windows is not supported.
Install CUDA 11 or 12: instructions can be found here.
Install cuDNN: follow the instructions here.

Run the following commands to set up basic necessities:

sudo apt install build-essential
sudo snap install cmake --classic
sudo apt install libspdlog-dev libfmt-dev

Install OpenCV with CUDA support. You can compile OpenCV from the source using the provided script in the .scripts directory.
Download TensorRT 10 from here.
Edit CMakeLists.txt to point to your TensorRT installation.

Building the Library

mkdir build
cd build
cmake ..
make -j$(nproc)

Running the Executable

Once your library is built, you can run inference with it by following these instructions:

Navigate to the build directory.

Run the executable with the path to your ONNX model:

./run_inference_benchmark --onnx_model ../models/yolov8n.onnx

The first time you run this, a TensorRT engine will be built from your ONNX model, which may take some time, depending on the model complexity.

You can also supply your TensorRT engine file directly using:

./run_inference_benchmark --trt_model ../models/yolov8n.engine.NVIDIA_GeForce_RTX_3080_Laptop_GPU.fp16.1.1

Understanding the Code

The core implementation resides in the include/engine directory, where each file has been commented extensively to facilitate understanding. Here’s a brief analogy to help you comprehend how the code functions:

Imagine your project as a garden. The main.cpp serves as the gardener, overseeing everything and deciding when to plant (call functions) and when to harvest (retrieve outputs). The EngineRunInference.inl is akin to a fertile plot where plants (data) grow; it’s where the inference process happens actively. Lastly, the EngineBuildLoadNetwork.inl is like a skilled architect, ensuring everything is built appropriately and efficiently from plans (the ONNX model).

Troubleshooting

Should you encounter challenges, follow these troubleshooting tips:

To debug logging issues, you can adjust the log level by setting the LOG_LEVEL environment variable. Options include trace, debug, info, warn, error, critical, off.
If you face problems creating the TensorRT engine, set LOG_LEVEL to trace for more verbose output on what went wrong.
If you get an “out of memory” error, consider reducing the Options.calibrationBatchSize.
For further assistance, connect with the community for ongoing updates or collaboration opportunities at fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Conclusion

With this comprehensive guide, you should now have a solid understanding of how to use the TensorRT C++ API for GPU inference. If the project was helpful, consider giving it a star to encourage continual updates and improvements!

For additional tips, insights, or inquiries regarding AI development projects, stay connected with fxis.ai.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox