How to Use the BGE-M3 ONNX Model with ONNX Runtime

Mar 31, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_13_214

In this tutorial, we’ll explore how to utilize the BGE-M3 model, which has been converted to ONNX weights, to compute both dense and ColBERT embeddings efficiently. BGE-M3 is an advanced embedding model capable of supporting dense retrieval and lexical matching, making it a suitable choice for various AI tasks.

Understanding the BGE-M3 ONNX Model

The BGE-M3 ONNX model can be thought of as a high-tech blender designed to create two types of smoothies simultaneously: dense and ColBERT embeddings. Just as the blender mixes ingredients to create a delicious drink, the BGE-M3 model takes input data and produces rich representations that can be leveraged for various applications. Each output smoothie is stored in its own container, ready to be served as needed.

Installation Process

Before diving into model usage, we need to set up our environment by installing the necessary packages. Here’s how you can do that:

pip install huggingface-hub onnxruntime transformers

Using the BGE-M3 ONNX Model

Once the required modules are installed, you can compute embeddings using the following Python code:

from huggingface_hub import hf_hub_download
import onnxruntime as ort
from transformers import AutoTokenizer

# Download the model and its data
hf_hub_download(repo_id="ddmitov/bge_m3_dense_colbert_onnx", filename="model.onnx", local_dir="tmp", repo_type="model")
hf_hub_download(repo_id="ddmitov/bge_m3_dense_colbert_onnx", filename="model.onnx_data", local_dir="tmp", repo_type="model")

# Load tokenizer and initialize ONNX session
tokenizer = AutoTokenizer.from_pretrained("ddmitov/bge_m3_dense_colbert_onnx")
ort_session = ort.InferenceSession("tmp/model.onnx")

# Prepare input text for model inference
inputs = tokenizer("BGE M3 is an embedding model supporting dense retrieval and lexical matching.", padding="longest", return_tensors="np")
inputs_onnx = {key: ort.OrtValue.ortvalue_from_numpy(value) for key, value in inputs.items()}

# Run inference
outputs = ort_session.run(None, inputs_onnx)

# Displaying the output
print(f"Number of Dense Vectors: {len(outputs[0])}")
print(f"Dense Vector Length: {len(outputs[0][0])}")
print()
print(f"Number of ColBERT Vectors: {len(outputs[1][0])}")
print(f"ColBERT vector length: {len(outputs[1][0][0])}")

Understanding the Output

When you run the above code, you will receive output indicating the number of dense and ColBERT vectors, akin to knowing how many servings of smoothies you’ve prepared. The output will look something like this:

# Expected output:
# Number of Dense Vectors: 1
# Dense Vector Length: 1024
# Number of ColBERT Vectors: 24
# ColBERT vector length: 1024

Troubleshooting Tips

If you encounter any issues while working with the BGE-M3 model, consider the following troubleshooting tips:

Missing Packages: Ensure all required packages are installed correctly. You can re-run the installation command if necessary.
Incorrect Model Download: Verify that the model is downloaded to the specified directory. Check for typos in the repo ID.
Input Shape Mismatch: Double-check the shape of the input text to ensure it conforms to the expected shape for the model.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By following these steps, you can successfully harness the power of the BGE-M3 ONNX model for your AI applications. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox