How to Convert BGE-M3 to ONNX Weights Using HF Optimum

Feb 16, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_25_182

In the world of AI, converting models to the ONNX format can significantly enhance their portability and performance. Today, we will delve into the process of converting the BGE-M3 model to ONNX weights, making it compatible with ONNX Runtime. This guide is crafted to be user-friendly, offering step-by-step instructions and troubleshooting tips.

Understanding BGE-M3 and ONNX

The BGE-M3 model is a powerful embedding model that supports dense retrieval, lexical matching, and multi-vector interaction. Converting this model to ONNX format allows it to produce dense, sparse, and ColBERT embedding representations simultaneously.

To visualize this, think of the BGE-M3 model as a chef in a kitchen. The chef can prepare various dishes (embeddings) at once—dense stew, sparse salad, and a hearty ColBERT casserole. With ONNX, it’s as if we provide the chef with a faster oven that cooks multiple dishes in record time.

Getting Started

To successfully convert BGE-M3 to ONNX weights, follow these steps:

Install ONNX Runtime: If you haven’t already, install the ONNX Runtime Python library with the command:

pip install onnxruntime==1.17.0

Install Hugging Face Transformers: For tokenization, use the Hugging Face Transformers library:

pip install transformers==4.37.2

Clone the Repository: Clone the repository using Git LFS to get the ONNX model files.

Using the Model with ONNX Runtime

Now that you’ve set up your environment, you can begin using the BGE-M3 model to compute embeddings:

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-m3")
ort_session = ort.InferenceSession("model.onnx")

inputs = tokenizer("BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", 
                   padding='longest', return_tensors='np')
inputs_onnx = {k: ort.OrtValue.ortvalue_from_numpy(v) for k, v in inputs.items()}
outputs = ort_session.run(None, inputs_onnx)

Please note that the tokenizer translates the input sentence into a format that the model can process, akin to how a translator interprets a recipe before the chef begins cooking.

Processing Sparse Token Weights

To obtain the sparse representation, you can use the following code snippet:

from collections import defaultdict

def process_token_weights(token_weights: np.ndarray, input_ids: list):
    result = defaultdict(int)
    unused_tokens = set(
        [tokenizer.cls_token_id, tokenizer.eos_token_id, tokenizer.pad_token_id, tokenizer.unk_token_id]
    )
    for w, idx in zip(token_weights, input_ids):
        if idx not in unused_tokens and w > 0:
            idx = str(idx)
            if w > result[idx]:
                result[idx] = w
    return result

token_weights = outputs[1].squeeze(-1)
lexical_weights = list(map(process_token_weights, token_weights, inputs['input_ids'].tolist()))

Exporting ONNX Weights

To export the ONNX weights, follow these steps:

First, install the required Python packages:

pip install -r requirements.txt

Then, execute the export script:

python export_onnx.py --output . --opset 17 --device cpu --optimize O2

If you want a version without optimizations, tweak the script with the appropriate arguments. You can learn more about optimization levels here.

Troubleshooting

If you encounter issues while following the above steps, consider the following troubleshooting tips:

Ensure you have installed the correct versions of the required libraries.
Check that the paths to the ONNX model file are correct.
Verify compatibility of your hardware with ONNX Runtime.
If modifications were made in bgem3_model.py, review the changes for errors.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

With the above steps, you’ve successfully converted the BGE-M3 model to ONNX format, enabling enhanced performance and versatility in AI applications. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox