LLM2Vec is an exciting tool that converts decoder-only large language models into efficient text encoders. This guide walks you through the process of setting it up, using it, and troubleshooting any issues you might encounter along the way.
What You Need to Get Started
- Python: Ensure you have Python installed on your machine.
- pip: The Python package manager should be available.
- Access to GPU (optional but recommended): For enhanced performance, especially during model training.
Installation Steps
Follow these steps to install LLM2Vec:
bash
pip install llm2vec
Usage Walkthrough
Now that you have everything set up, let’s dive into using LLM2Vec. The usage can be thought of as making a sandwich: you need to layer the ingredients just right to create a delicious final product.
1. Import the Necessary Libraries
First, you need to bring in the required libraries:
python
from llm2vec import LLM2Vec
import torch
from transformers import AutoTokenizer, AutoModel, AutoConfig
from peft import PeftModel
2. Load the Model
Next, imagine you’re assembling your sandwich layers:
- Start with the tokenizer as the base.
- Layer on the configuration for the model.
- Add the model itself.
- Finally, introduce additional components like LoRA weights.
python
# Loading base Mistral model
tokenizer = AutoTokenizer.from_pretrained("McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp")
config = AutoConfig.from_pretrained("McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp", trust_remote_code=True)
model = AutoModel.from_pretrained("McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp",
trust_remote_code=True, config=config, torch_dtype=torch.bfloat16,
device_map="cuda" if torch.cuda.is_available() else "cpu")
# Merging and unloading model weights
model = PeftModel.from_pretrained(model, "McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp")
model = model.merge_and_unload()
3. Encode Queries and Documents
With the model loaded, you can start encoding your text. Think of this as cutting your sandwich to serve it! Use the LLM2Vec wrapper for encoding queries and documents.
python
# Wrapper for encoding and pooling operations
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512)
# Encoding queries
instruction = "Given a web search query, retrieve relevant passages that answer the query:"
queries = [
[instruction, "how much protein should a female eat"],
[instruction, "summit define"],
]
q_reps = l2v.encode(queries)
# Encoding documents
documents = [
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon.",
"Definition of summit for English Language Learners: 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments."
]
d_reps = l2v.encode(documents)
4. Compute Cosine Similarity
The final step for understanding the similarity between your encoded queries and documents is akin to tasting your sandwich before serving:
python
# Compute cosine similarity
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1)
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1)
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1))
print(cos_sim)
Troubleshooting
Encountering issues? Here are some potential fixes:
- Ensure that all library imports are correct and installed properly.
- Check if your GPU is recognized by PyTorch by running:
python
print(torch.cuda.is_available())
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.