Welcome! In this blog, we’ll explore how to effectively utilize the Bloomz-3b-retriever-v2 model. This marvel of modern technology is ideal for Open Domain Question Answering (ODQA) in both French and English. Let’s dive into the setup and execution, ensuring you’ll leverage its capabilities without a hitch!
Understanding the Bloomz-3b-retriever-v2
The Bloomz-3b-retriever-v2 model can be likened to a librarian who has an extraordinary ability to quickly find relevant articles in both French and English based on your queries. When you ask, “Where can I find information about AI?” this librarian first understands your question and then sifts through tens of thousands of articles in the library to provide you with the most relevant ones, all while keeping both languages in mind.
Getting Started with Bloomz-3b-retriever-v2
The Bloomz-3b-retriever-v2 can be interfaced easily via the Transformers API or the Pipeline API. Below, we’ll go through the steps for both methods:
With Transformers API
To use the Transformers API, follow these steps:
python
from typing import Union, List
import numpy as np
import torch
from transformers import AutoTokenizer, AutoModel
from scipy.spatial.distance import cdist
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('cmarkeabloomz-3b-retriever-v2')
model = AutoModel.from_pretrained('cmarkeabloomz-3b-retriever-v2')
# Function to infer embeddings from texts
def infer(txt: Union[str, List[str]]):
tok = tokenizer(txt, padding=True, return_tensors='pt')
with torch.no_grad():
embedding = model(**tok)
# Important: take only last token!
return embedding.get('last_hidden_state')[:, -1, :].numpy()
# List of contexts and their embeddings
list_of_contexts = [...]
emb_contexts = infer(list_of_contexts)
# List of queries and their embeddings
list_of_queries = [...]
emb_queries = infer(list_of_queries)
# Important: take cosine distance!
dist = cdist(emb_queries, emb_contexts, 'cosine')
# Function to get top k nearest contexts for each query
top_k = lambda x: [
[list_of_contexts[qq] for qq in ii]
for ii in dist.argsort(axis=-1)[:, :x]
]
# Get top 5 nearest contexts for each query
top_contexts = top_k(5)
With Pipeline API
Alternatively, you can utilize the Pipeline API for a more streamlined approach:
python
import numpy as np
from transformers import pipeline
from scipy.spatial.distance import cdist
# Load the retriever pipeline
retriever = pipeline('feature-extraction', model='cmarkeabloomz-3b-retriever-v2')
# Important: take only last token!
infer = lambda x: [ii[0][-1] for ii in retriever(x)]
# List of contexts and their embeddings
list_of_contexts = [...]
emb_contexts = np.concatenate(infer(list_of_contexts), axis=0)
# List of queries and their embeddings
list_of_queries = [...]
emb_queries = np.concatenate(infer(list_of_queries), axis=0)
# Important: take cosine distance!
dist = cdist(emb_queries, emb_contexts, 'cosine')
# Function to get top k nearest contexts for each query
top_k = lambda x: [
[list_of_contexts[qq] for qq in ii]
for ii in dist.argsort(axis=-1)[:, :x]
]
# Get top 5 nearest contexts for each query
top_contexts = top_k(5)
Troubleshooting
If you encounter issues while using the Bloomz-3b-retriever-v2, here are some troubleshooting tips:
- Ensure you have the required libraries installed:
transformers,torch, andscipy. - Check your internet connection; a stable connection is vital for downloading models.
- If you see errors related to model loading, ensure that the model name is correctly spelled and accessible.
- For performance issues, consider verifying your input data’s format and correctness.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
By harnessing the Bloomz-3b-retriever-v2 model, you can efficiently extract relevant information across languages and contexts. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

