How to Use the BGE-M3 Model with ONNX Runtime

Apr 1, 2024 | Educational

In the rapidly evolving world of artificial intelligence, optimizing machine learning models for performance and compatibility is crucial. Enter the BGE-M3 model—converted to ONNX weights for seamless integration and enhanced efficiency. In this guide, we will dive into how you can harness the BGE-M3 model for computing embeddings using ONNX Runtime. Let’s embark on this journey!

Understanding the BGE-M3 Model

The BGE-M3 model is designed to output both dense embeddings and ColBERT embedding representations. Think of it as an advanced machine capable of interpreting text just like a skilled translator who not only understands the words but also their contextual meanings. Using our “interpreter,” we can extract features from text and utilize them in various applications such as search and information retrieval.

Installation of Required Packages

First and foremost, you need to install the necessary Python modules to get started. Open your terminal and run the following command:

pip install huggingface-hub onnxruntime transformers

Using the Model to Compute Embeddings

Once the installation has been completed, you can proceed to utilize the BGE-M3 model for your computations. Here’s a step-by-step breakdown:

Import the required libraries:

from huggingface_hub import hf_hub_download
import onnxruntime as ort
from transformers import AutoTokenizer

Download the model files:

hf_hub_download(
    repo_id='ddmitov/bge_m3_dense_colbert_onnx',
    filename='model.onnx',
    local_dir='tmp',
    repo_type='model')

hf_hub_download(
    repo_id='ddmitov/bge_m3_dense_colbert_onnx',
    filename='model.onnx_data',
    local_dir='tmp',
    repo_type='model')

Load the tokenizer and create an inference session:

tokenizer = AutoTokenizer.from_pretrained('ddmitov/bge_m3_dense_colbert_onnx')
ort_session = ort.InferenceSession('tmp/model.onnx')

Prepare your input and get outputs:

inputs = tokenizer(
    "BGE M3 is an embedding model supporting dense retrieval and lexical matching.",
    padding='longest',
    return_tensors='np')

inputs_onnx = {key: ort.OrtValue.ortvalue_from_numpy(value) for key, value in inputs.items()}
outputs = ort_session.run(None, inputs_onnx)

Check the outputs:

print(f"Number of Dense Vectors: {len(outputs[0])}")
print(f"Dense Vector Length: {len(outputs[0][0])}")
print()
print(f"Number of ColBERT Vectors: {len(outputs[1][0])}")
print(f"ColBERT vector length: {len(outputs[1][0][0])}")

Expected Output

When executed successfully, you should see output similar to:

# Number of Dense Vectors: 1
# Dense Vector Length: 1024
#
# Number of ColBERT Vectors: 24
# ColBERT vector length: 1024

Troubleshooting Common Issues

While working with the BGE-M3 model in ONNX Runtime, you might encounter some common issues. Here are a few troubleshooting ideas:

Module not found: Ensure all modules are correctly installed using the pip command mentioned earlier. If any errors occur, recheck the installation process.
ONNX Model Errors: Double-check the paths when loading your model. Make sure the files exist in the specified directory.
Unexpected Outputs: Confirm that the sentence input adheres to the expected format; this can heavily influence the output results.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Utilizing the BGE-M3 model with ONNX Runtime opens up a world of possibilities for text embeddings in machine learning tasks. Whether you are working on search optimization or enriching data representation, this model can be a valuable asset. Remember to keep an eye out for any issues and refer to the troubleshooting section when in doubt.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox