How to Accelerate Inference with HunyuanDiT TensorRT

Aug 1, 2024 | Educational

If you’re looking to enhance the performance of your AI models, you’ve reached the right place. This guide will walk you through the steps of using HunyuanDiT with TensorRT for faster inference acceleration. For those who are unaware, TensorRT is a high-performance deep learning inference library that can significantly reduce the time it takes for your models to generate outputs. Buckle up as we dive into the world of acceleration!

Prerequisites

  • NVIDIA GPU with Compute Capability of 8.0 (e.g., RTX 4090, RTX 3090)
  • TensorRT version 10.1.0.27
  • CUDA version 11.7 or 11.8

Step-by-Step Instructions

1. Download dependencies from Hugging Face

First, you need to download the HunyuanDiT model.

shell
cd HunyuanDiT
huggingface-cli download Tencent-HunyuanTensorRT-libs --local-dir .ckptst2imodel_trt

2. Install the TensorRT dependencies

Next, let’s extract and set up the required dependencies.

shell
sh trtinstall.sh
source trtactivate.sh

3. Build the TensorRT Engine

Now it’s time to either build your own engine or use a prebuilt one.

Method 1: Build Your Own Engine (Recommended)

If you’re using a different GPU model, you can create the engine with the following command:

shell
sh trtbuild_engine.sh

Using Previous Versions

To build engines for earlier versions:

shell
# v1.1
sh trtbuild_engine.sh 1.1
# v1.0
sh trtbuild_engine.sh 1.0

Look for the output like PASSED TensorRT.trtexec [TensorRT v10100] indicating a successful build.

Method 2: Use the Prebuilt Engine (Only for v1.x)

If you prefer to use preexisting engines, download them via:

shell
export REMOTE_PATH=Remote Path
huggingface-cli download Tencent-HunyuanTensorRT-engine $REMOTE_PATH .ckptst2imodel_trt
ln -s $REMOTE_PATH .ckptst2imodel_trtenginemodel_onnx.plan

4. Run the Inference Using the TensorRT Model

Before you start inference, ensure that you’ve activated the environment:

shell
source trtactivate.sh

Then run the model. Here’s how:

shell
python sample_t2i.py --prompt 渔舟唱晚 --infer-mode trt
python sample_t2i.py --prompt 渔舟唱晚 --infer-mode trt --no-enhance

5. Important Notes

The TensorRT engine is optimized for specific input shapes. Currently, it supports:

python
STANDARD_SHAPE = [
    [(1024, 1024), (1280, 1280)],  # 1:1
    [(1280, 960)],                  # 4:3
    [(960, 1280)],                  # 3:4
    [(1280, 768)],                  # 16:9
    [(768, 1280)],                  # 9:16
]

Troubleshooting

In case you encounter issues such as the TensorRT Engine file failing to generate or unsatisfactory inference performance, check the following:

  • Verify that your NVIDIA GPU supports Compute Capability 8.0.
  • Ensure that you have installed the correct versions of TensorRT and CUDA.
  • Check that you followed each step correctly and that dependencies are properly installed.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox