How to Convert the IntFloatMultilingual-E5-Large Model to ONNX FP16 and INT8 Formats for Vespa Embedding

Apr 13, 2024 | Educational

In the world of AI and machine learning, converting models to formats that are optimized for performance is crucial. In this guide, we will explore the steps needed to convert the IntFloatMultilingual-E5-Large model to ONNX FP16 and INT8 formats, making it easy for you to integrate it with Vespa Embedding.

Overview of the Process

The process involves two main formats for the model: FP16 and INT8. The FP16 format is beneficial for when performance is needed without too much concern for model size, while INT8 quantization offers smaller file sizes, which can speed up deployment times.

Model Conversion Steps

1. Download the Model

You will first need to download the model. This can be done via the link: IntFloatMultilingual-E5-Large.

2. Converting to INT8

To convert the model to INT8 format, follow the steps below:

Use the Optimum toolkit available on GitHub.
Run the following command in your terminal:

export_hf_model_from_hf.py --hf_model intfloatmultilingual-e5-large --output_dir me5-large 
optimum-cli onnxruntime quantize --onnx_model .me5-large -o me5-large-large_quantized --avx512_vnni

3. Converting to FP16

Converting the model to FP16 is straightforward. Use the following steps:

Again, use the export_hf_model_from_hf.py script.
Execute the following command:

export_hf_model_from_hf.py --hf_model intfloatmultilingual-e5-large --output_dir me5-large 
https://gist.github.com/hotchpotch/64fa52d32886fe61cc1d110066afef38
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py

Understanding the Vespa Services Configuration

To utilize the converted models with Vespa, you’ll need to adjust the services.xml configuration file. Think of this as setting up a stage for a performance — you must ensure everything is in place before the actors (models) take the spotlight.

Example Configuration for FP16



    
        https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolvemain/intfloat-multilingual-e5-large_fp16.onnx
    
    
        https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolvemaintokenizer.json
    
    true
    mean

For INT8, simply replace the URL for the FP16 model with the INT8 quantized model’s URL.

Troubleshooting and Deployment Tips

When deploying the FP16 model, which tends to be larger, you may experience longer deployment times. If issues arise:

Ensure you are using Vespa version 8.325.46 or above.
If the model does not respond as expected, check the URL paths for accuracy.
Verify the model files are correctly situated in the deployment folder.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By following the steps outlined in this guide, you can successfully convert and deploy the IntFloatMultilingual-E5-Large model in both FP16 and INT8 formats with Vespa. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox