In this tutorial, we’ll explore how to utilize the BGE-M3 model, which has been converted to ONNX weights, to compute both dense and ColBERT embeddings efficiently. BGE-M3 is an advanced embedding model capable of supporting dense retrieval and lexical matching, making it a suitable choice for various AI tasks.
Understanding the BGE-M3 ONNX Model
The BGE-M3 ONNX model can be thought of as a high-tech blender designed to create two types of smoothies simultaneously: dense and ColBERT embeddings. Just as the blender mixes ingredients to create a delicious drink, the BGE-M3 model takes input data and produces rich representations that can be leveraged for various applications. Each output smoothie is stored in its own container, ready to be served as needed.
Installation Process
Before diving into model usage, we need to set up our environment by installing the necessary packages. Here’s how you can do that:
pip install huggingface-hub onnxruntime transformers
Using the BGE-M3 ONNX Model
Once the required modules are installed, you can compute embeddings using the following Python code:
from huggingface_hub import hf_hub_download
import onnxruntime as ort
from transformers import AutoTokenizer
# Download the model and its data
hf_hub_download(repo_id="ddmitov/bge_m3_dense_colbert_onnx", filename="model.onnx", local_dir="tmp", repo_type="model")
hf_hub_download(repo_id="ddmitov/bge_m3_dense_colbert_onnx", filename="model.onnx_data", local_dir="tmp", repo_type="model")
# Load tokenizer and initialize ONNX session
tokenizer = AutoTokenizer.from_pretrained("ddmitov/bge_m3_dense_colbert_onnx")
ort_session = ort.InferenceSession("tmp/model.onnx")
# Prepare input text for model inference
inputs = tokenizer("BGE M3 is an embedding model supporting dense retrieval and lexical matching.", padding="longest", return_tensors="np")
inputs_onnx = {key: ort.OrtValue.ortvalue_from_numpy(value) for key, value in inputs.items()}
# Run inference
outputs = ort_session.run(None, inputs_onnx)
# Displaying the output
print(f"Number of Dense Vectors: {len(outputs[0])}")
print(f"Dense Vector Length: {len(outputs[0][0])}")
print()
print(f"Number of ColBERT Vectors: {len(outputs[1][0])}")
print(f"ColBERT vector length: {len(outputs[1][0][0])}")
Understanding the Output
When you run the above code, you will receive output indicating the number of dense and ColBERT vectors, akin to knowing how many servings of smoothies you’ve prepared. The output will look something like this:
# Expected output:
# Number of Dense Vectors: 1
# Dense Vector Length: 1024
# Number of ColBERT Vectors: 24
# ColBERT vector length: 1024
Troubleshooting Tips
If you encounter any issues while working with the BGE-M3 model, consider the following troubleshooting tips:
- Missing Packages: Ensure all required packages are installed correctly. You can re-run the installation command if necessary.
- Incorrect Model Download: Verify that the model is downloaded to the specified directory. Check for typos in the repo ID.
- Input Shape Mismatch: Double-check the shape of the input text to ensure it conforms to the expected shape for the model.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
By following these steps, you can successfully harness the power of the BGE-M3 ONNX model for your AI applications. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

