Welcome to the world of semantic search! Today, we’re diving into the use of DPR-XM, a multilingual dense single-vector bi-encoder model for mapping questions and paragraphs into 768-dimensional dense vectors. With its ability to perform zero-shot retrieval across multiple languages, this guide will simplify the process for you. So, let’s buckle up and embark on this journey!
1. Getting Started with DPR-XM
To use DPR-XM, you will need to have the necessary libraries and dependencies installed. Let’s go through the process step by step.
Step 1: Install Required Libraries
- If you’re using the Sentence-Transformers library, run:
pip install -U sentence-transformers
pip install -U FlagEmbedding
pip install -U transformers
2. Example Code Using DPR-XM
Now, let’s understand how to implement this model using different libraries.
2.1 Using Sentence-Transformers
Here’s where the magic begins! Imagine you’re a chef, and your queries and passages are ingredients waiting to be mixed into a delectable dish.
from sentence_transformers import SentenceTransformer
queries = ["Ceci est un exemple de requête.", "Voici un second exemple."]
passages = ["Ceci est un exemple de passage.", "Et voilà un deuxième exemple."]
language_code = 'fr_FR' # French
model = SentenceTransformer('antoinelouis/dpr-xm')
model[0].auto_model.set_default_language(language_code) # Activate the language-specific adapter
q_embeddings = model.encode(queries, normalize_embeddings=True)
p_embeddings = model.encode(passages, normalize_embeddings=True)
similarity = q_embeddings @ p_embeddings.T
print(similarity)
2.2 Using FlagEmbedding
Continuing with our chef analogy, FlagEmbedding adds some special spices to enhance the flavor of your dish.
from FlagEmbedding import FlagModel
queries = ["Ceci est un exemple de requête.", "Voici un second exemple."]
passages = ["Ceci est un exemple de passage.", "Et voilà un deuxième exemple."]
language_code = 'fr_FR' # French
model = FlagModel('antoinelouis/dpr-xm')
model.model.set_default_language(language_code) # Activate the language-specific adapter
q_embeddings = model.encode(queries, normalize_embeddings=True)
p_embeddings = model.encode(passages, normalize_embeddings=True)
similarity = q_embeddings @ p_embeddings.T
print(similarity)
2.3 Using Transformers
Lastly, using Transformers is akin to carefully plating your dish, ensuring that each part shines on its own.
from transformers import AutoTokenizer, AutoModel
from torch.nn.functional import normalize
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] # First element contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
queries = ["Ceci est un exemple de requête.", "Voici un second exemple."]
passages = ["Ceci est un exemple de passage.", "Et voilà un deuxième exemple."]
language_code = 'fr_FR' # French
tokenizer = AutoTokenizer.from_pretrained('antoinelouis/dpr-xm')
model = AutoModel.from_pretrained('antoinelouis/dpr-xm')
model.set_default_language(language_code) # Activate the language-specific adapter
q_input = tokenizer(queries, padding=True, truncation=True, return_tensors='pt')
p_input = tokenizer(passages, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
q_output = model(**q_input)
p_output = model(**p_input)
q_embeddings = mean_pooling(q_output, q_input['attention_mask'])
q_embeddings = normalize(q_embeddings, p=2, dim=1)
p_embeddings = mean_pooling(p_output, p_input['attention_mask'])
p_embeddings = normalize(p_embeddings, p=2, dim=1)
similarity = q_embeddings @ p_embeddings.T
print(similarity)
3. Troubleshooting
Even the best chefs encounter difficulties sometimes. If you face issues while implementing DPR-XM, here are some troubleshooting tips:
- Ensure all libraries are updated to their latest versions.
- Double-check your model paths and ensure they exist.
- If you encounter memory errors, consider using a smaller batch size or reducing the model size.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

