Harnessing the Power of AI with Transformers

Oct 28, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesMarqo_dunzhang-stella_en_400M_v5

In this blog post, we’ll explore how to utilize the stella_en_400M_v5 model from the MTEB suite to retrieve relevant documents based on user queries. Think of it as a smart librarian who, instead of wandering through endless shelves, use training and technology to instantly find the information you need.

What You’ll Need

Python installed on your machine
Transformers library from Hugging Face
A compatible GPU (recommended for performance)

Step-by-Step Instructions

1. Set up Your Environment

To get started, make sure you have the necessary libraries installed. You can install the Transformers library and any requirements using pip:

pip install transformers torch sklearn

2. Import Required Libraries

Once the libraries are installed, you’ll need to import them in your Python script:

import os
import torch
from transformers import AutoModel, AutoTokenizer
from sklearn.preprocessing import normalize

3. Define Your Queries

Next, you’ll define the queries and the documents you want the model to evaluate.

query_prompt = "Instruct: Given a web search query, retrieve relevant passages that answer the query."
queries = [query_prompt + " " + query for query in [
    "What are some ways to reduce stress?",
    "What are the benefits of drinking green tea?"
]]
docs = [
    "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity.",
    "Green tea has been consumed for centuries and is known for its potential health benefits."
]

4. Load the Model

Load the pre-trained model and tokenizer using the following lines:

model_dir = "Marqodunzhang-stella_en_400M_v5"
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

5. Convert Queries and Documents into Vectors

Let’s transform the queries and documents into a format that the model can understand:

with torch.no_grad():
    input_data = tokenizer(queries, padding='longest', truncation=True, max_length=512, return_tensors='pt')
    input_data = {k: v.cuda() for k, v in input_data.items()}
    last_hidden_state = model(**input_data)[0]
    attention_mask = input_data["attention_mask"]
    last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
    query_vectors = last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
    query_vectors = normalize(query_vectors.cpu().numpy())

# Repeat for documents
with torch.no_grad():
    input_data = tokenizer(docs, padding='longest', truncation=True, max_length=512, return_tensors='pt')
    input_data = {k: v.cuda() for k, v in input_data.items()}
    last_hidden_state = model(**input_data)[0]
    attention_mask = input_data["attention_mask"]
    last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
    docs_vectors = last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
    docs_vectors = normalize(docs_vectors.cpu().numpy()

6. Calculate Similarities

Now that we have our query and document vectors, it’s time to measure their similarities:

similarities = query_vectors @ docs_vectors.T
print(similarities)  # Displays similarity scores between queries and documents

Adding an Analogy to Understand Vectors

Imagine navigating through a city using a map and GPS. Each location on your map can be represented as a vector of coordinates. The query you make (e.g., “Where is the nearest coffee shop?”) acts like a GPS command, processing all the surrounding coordinates (documents) to find the closest match based on distance. This process of vectorizing queries and documents is akin to the GPS determining the shortest route based on the user’s position and desired destination.

Troubleshooting Tips

Error Loading Model: Make sure you have the model’s path correctly set.
CUDA Error: Ensure your GPU drivers are up to date and you have a compatible version of PyTorch installed.
Memory Overflow: If your GPU runs out of memory, try reducing the batch size or the maximum sequence length.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Now you have a robust understanding of how to implement the stella_en_400M_v5 model to process queries and retrieve relevant documents. With the right approach, you can leverage AI’s power to facilitate information discovery efficiently.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox