How to Use bge-m3-onnx-o4 for Advanced Filter Retrieval

Feb 6, 2024 | Educational

In today’s digital landscape, managing and retrieving vast amounts of data is crucial. The bge-m3-onnx-o4 model stands out because it enhances retrieval functionalities across multiple languages and input sizes. This blog post serves as a detailed guide to help you get started with this powerful model.

Why is this Model Cool?

The bge-m3-onnx-o4 offers several impressive features that make it an enticing choice for data retrieval:

  • Multi-Functionality: Performs dense, multi-vector, and sparse retrievals all at once.
  • Multi-Linguality: Supports over 100 languages, making it versatile for global applications.
  • Multi-Granularity: Capable of handling a wide range of inputs, from short sentences to long documents (up to 8192 tokens).

Getting Started with bge-m3-onnx-o4

Before diving into the model’s functionalities, you need to set it up properly. Follow these instructions to download the model weights:

Step 1: Download Model Weights

You cannot directly load the model from the online version due to exceptions. Instead, follow these steps:

  • Install the huggingface-hub:
  • pip install huggingface-hub
  • Import the necessary functions and download the model weights:
  • from huggingface_hub import snapshot_download
    snapshot_download(repo_id="hooman650/bge-m3-onnx-o4", local_dir="bge-m3-onnx")

Using the Model for Dense Retrieval

Once you have downloaded the model weights, you are ready to use the model. Below are the steps for implementing dense retrieval:

  • Ensure you have the required libraries installed:
  • pip install --upgrade-strategy eager optimum[onnxruntime]
  • Import the necessary modules:
  • from optimum.onnxruntime import ORTModelForFeatureExtraction
    from transformers import AutoTokenizer
    import torch
  • Load the model weights and the tokenizer:
  • model = ORTModelForFeatureExtraction.from_pretrained("bge-m3-onnx", provider=CUDAExecutionProvider)
    tokenizer = AutoTokenizer.from_pretrained("hooman650/bge-m3-onnx-o4")
  • Prepare your sentences:
  • sentences = [
        "The quick brown fox jumps over the lazy dog.",
        "El rápido zorro marrón salta sobre el perro perezoso.",
        "Le renard brun rapide saute par-dessus le chien paresseux.",
        "Der schnelle braune Fuchs springt über den faulen Hund.",
        "La volpe marrone veloce salta sopra il cane pigro.",
        "Быстрая коричневая лиса прыгает через ленивую собаку.",
        "الثعلب البني السريع يقفز فوق الكلب الكسول.",
        "तेज़ भूरी लोमड़ी आलसी कुत्ते के ऊपर कूद जाती है।"
    ]
  • Tokenize the sentences and move them to the correct device:
  • encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt").to("cuda")
  • Get the embeddings:
  • out = model(**encoded_input, return_dict=True).last_hidden_state
    dense_vecs = torch.nn.functional.normalize(out[:, 0], dim=-1)

Understanding the Code: An Analogy

Imagine you own a large library filled with books of various sizes and in multiple languages. The bge-m3-onnx-o4 model is like a librarian who not only knows where every book is located but can also summarize and retrieve information from each book based on your request.

When you input a query (sentences in our case), the librarian (the model) checks all the books in the library (the embeddings) and provides a concise answer while ensuring it maintains the context and meaning of the information requested (normalizing the embeddings).

Troubleshooting Common Issues

If you encounter any issues while implementing the model, here are some troubleshooting steps to help you out:

  • Model Not Loading: Ensure that the model weights are correctly downloaded and specified in the code.
  • Import Errors: Double-check that you have installed all necessary libraries as indicated in the setup steps above.
  • Runtime Errors: Make sure you are using a compatible version of Python and have correctly allocated resources (CUDA or CPU).

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox