How to Convert Decoder-Only LLMs into Text Encoders with LLM2Vec

May 3, 2024 | Educational

Transforming decoder-only language models into effective text encoders can elevate your natural language processing (NLP) projects significantly. With the LLM2Vec framework, this transformation is simplified into three straightforward steps: enabling bidirectional attention, implementing masked next token prediction, and employing unsupervised contrastive learning. Below, you’ll find an easy-to-follow guide to help you set up and utilize LLM2Vec.

Installation

Begin by installing the LLM2Vec package using pip. Open your terminal and run:

bash
pip install llm2vec

Usage

After installing, you can begin utilizing LLM2Vec with the following steps:

  1. Import necessary libraries and set up the base model.
  2. Load the model and tokenizer.
  3. Encode your queries and documents, then compute similarities.

Step-by-Step Setup

To illustrate this process, let’s think of preparing a delicious recipe. You’ll need your ingredients (model, tokenizer, and queries) all lined up, just like you would with preparing food. Here’s how the code flows:

python
from llm2vec import LLM2Vec
import torch
from transformers import AutoTokenizer, AutoModel, AutoConfig
from peft import PeftModel

# Load base Mistral model, enabling bidirectional connections
tokenizer = AutoTokenizer.from_pretrained("McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp")
config = AutoConfig.from_pretrained("McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp", trust_remote_code=True)
model = AutoModel.from_pretrained("McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp", trust_remote_code=True, config=config)

# Merge model with LoRA weights
model = PeftModel.from_pretrained(model, "McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp")
model = model.merge_and_unload()  # This can take several minutes

Imagine loading a simple model as mixing together basic ingredients in the right proportions. Once everything is set up, you’ll refine the mixture with the following lines:

python
# Load unsupervised SimCSE model
model = PeftModel.from_pretrained(model, "McGill-NLPLLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse")

# Wrapper for encoding and pooling operations
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512)

# Encoding queries
instruction = "Given a web search query, retrieve relevant passages that answer the query:"
queries = [
    [instruction, "how much protein should a female eat"],
    [instruction, "summit define"],
]
q_reps = l2v.encode(queries)

# Encoding documents
documents = [
    "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day.",
    "Definition of summit for English Language Learners: 1 the highest point of a mountain.",
]
d_reps = l2v.encode(documents)

# Compute cosine similarity
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1)
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1)
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1))
print(cos_sim)

This segment represents the final cooking of the dish—establishing flavors and measuring success with cosine similarity results. The printed tensor will show the similarity scores, akin to tasting the dish and adjusting spices accordingly.

Troubleshooting

If you encounter any issues during installation or while running the code, consider the following troubleshooting steps:

  • Ensure that all packages, such as PyTorch and Transformers, are correctly installed and up to date.
  • Double-check your model and tokenizer names for typos or incorrect references.
  • Verify that you’re using a compatible environment (Python version, device settings).
  • For specific error messages, consult the official documentation or seek advice from the community.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By following these steps, you can easily convert a decoder-only language model into an effective text encoder using LLM2Vec. This transformation not only enhances the model’s capabilities but also provides a significant boost in various NLP tasks.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox