In the world of AI, converting models to the ONNX format can significantly enhance their portability and performance. Today, we will delve into the process of converting the BGE-M3 model to ONNX weights, making it compatible with ONNX Runtime. This guide is crafted to be user-friendly, offering step-by-step instructions and troubleshooting tips.
Understanding BGE-M3 and ONNX
The BGE-M3 model is a powerful embedding model that supports dense retrieval, lexical matching, and multi-vector interaction. Converting this model to ONNX format allows it to produce dense, sparse, and ColBERT embedding representations simultaneously.
To visualize this, think of the BGE-M3 model as a chef in a kitchen. The chef can prepare various dishes (embeddings) at once—dense stew, sparse salad, and a hearty ColBERT casserole. With ONNX, it’s as if we provide the chef with a faster oven that cooks multiple dishes in record time.
Getting Started
To successfully convert BGE-M3 to ONNX weights, follow these steps:
- Install ONNX Runtime: If you haven’t already, install the ONNX Runtime Python library with the command:
pip install onnxruntime==1.17.0
pip install transformers==4.37.2
Using the Model with ONNX Runtime
Now that you’ve set up your environment, you can begin using the BGE-M3 model to compute embeddings:
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-m3")
ort_session = ort.InferenceSession("model.onnx")
inputs = tokenizer("BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
padding='longest', return_tensors='np')
inputs_onnx = {k: ort.OrtValue.ortvalue_from_numpy(v) for k, v in inputs.items()}
outputs = ort_session.run(None, inputs_onnx)
Please note that the tokenizer translates the input sentence into a format that the model can process, akin to how a translator interprets a recipe before the chef begins cooking.
Processing Sparse Token Weights
To obtain the sparse representation, you can use the following code snippet:
from collections import defaultdict
def process_token_weights(token_weights: np.ndarray, input_ids: list):
result = defaultdict(int)
unused_tokens = set(
[tokenizer.cls_token_id, tokenizer.eos_token_id, tokenizer.pad_token_id, tokenizer.unk_token_id]
)
for w, idx in zip(token_weights, input_ids):
if idx not in unused_tokens and w > 0:
idx = str(idx)
if w > result[idx]:
result[idx] = w
return result
token_weights = outputs[1].squeeze(-1)
lexical_weights = list(map(process_token_weights, token_weights, inputs['input_ids'].tolist()))
Exporting ONNX Weights
To export the ONNX weights, follow these steps:
- First, install the required Python packages:
pip install -r requirements.txt
python export_onnx.py --output . --opset 17 --device cpu --optimize O2
If you want a version without optimizations, tweak the script with the appropriate arguments. You can learn more about optimization levels here.
Troubleshooting
If you encounter issues while following the above steps, consider the following troubleshooting tips:
- Ensure you have installed the correct versions of the required libraries.
- Check that the paths to the ONNX model file are correct.
- Verify compatibility of your hardware with ONNX Runtime.
- If modifications were made in
bgem3_model.py, review the changes for errors.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
With the above steps, you’ve successfully converted the BGE-M3 model to ONNX format, enabling enhanced performance and versatility in AI applications. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

