In today’s digital landscape, managing and retrieving vast amounts of data is crucial. The bge-m3-onnx-o4 model stands out because it enhances retrieval functionalities across multiple languages and input sizes. This blog post serves as a detailed guide to help you get started with this powerful model.
Why is this Model Cool?
The bge-m3-onnx-o4 offers several impressive features that make it an enticing choice for data retrieval:
- Multi-Functionality: Performs dense, multi-vector, and sparse retrievals all at once.
- Multi-Linguality: Supports over 100 languages, making it versatile for global applications.
- Multi-Granularity: Capable of handling a wide range of inputs, from short sentences to long documents (up to 8192 tokens).
Getting Started with bge-m3-onnx-o4
Before diving into the model’s functionalities, you need to set it up properly. Follow these instructions to download the model weights:
Step 1: Download Model Weights
You cannot directly load the model from the online version due to exceptions. Instead, follow these steps:
- Install the huggingface-hub:
pip install huggingface-hub
from huggingface_hub import snapshot_download
snapshot_download(repo_id="hooman650/bge-m3-onnx-o4", local_dir="bge-m3-onnx")
Using the Model for Dense Retrieval
Once you have downloaded the model weights, you are ready to use the model. Below are the steps for implementing dense retrieval:
- Ensure you have the required libraries installed:
pip install --upgrade-strategy eager optimum[onnxruntime]
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import torch
model = ORTModelForFeatureExtraction.from_pretrained("bge-m3-onnx", provider=CUDAExecutionProvider)
tokenizer = AutoTokenizer.from_pretrained("hooman650/bge-m3-onnx-o4")
sentences = [
"The quick brown fox jumps over the lazy dog.",
"El rápido zorro marrón salta sobre el perro perezoso.",
"Le renard brun rapide saute par-dessus le chien paresseux.",
"Der schnelle braune Fuchs springt über den faulen Hund.",
"La volpe marrone veloce salta sopra il cane pigro.",
"Быстрая коричневая лиса прыгает через ленивую собаку.",
"الثعلب البني السريع يقفز فوق الكلب الكسول.",
"तेज़ भूरी लोमड़ी आलसी कुत्ते के ऊपर कूद जाती है।"
]
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt").to("cuda")
out = model(**encoded_input, return_dict=True).last_hidden_state
dense_vecs = torch.nn.functional.normalize(out[:, 0], dim=-1)
Understanding the Code: An Analogy
Imagine you own a large library filled with books of various sizes and in multiple languages. The bge-m3-onnx-o4 model is like a librarian who not only knows where every book is located but can also summarize and retrieve information from each book based on your request.
When you input a query (sentences in our case), the librarian (the model) checks all the books in the library (the embeddings) and provides a concise answer while ensuring it maintains the context and meaning of the information requested (normalizing the embeddings).
Troubleshooting Common Issues
If you encounter any issues while implementing the model, here are some troubleshooting steps to help you out:
- Model Not Loading: Ensure that the model weights are correctly downloaded and specified in the code.
- Import Errors: Double-check that you have installed all necessary libraries as indicated in the setup steps above.
- Runtime Errors: Make sure you are using a compatible version of Python and have correctly allocated resources (CUDA or CPU).
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

