How to Get Started with InternLM2-20B-Reward Model

Jul 19, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_26_24

Welcome to the world of AI advancements! In this blog, we’ll navigate through the exciting features and functionalities of the InternLM2-20B-Reward model. This reward model is built upon InternLM2-Chat-20B-SFT and trained using an impressive dataset of over 2.4 million preference samples. Let’s dive into how to utilize this model effectively!

Understanding InternLM2-20B-Reward Through Analogy

Think of the InternLM2-20B-Reward model as a knowledgeable chef. Just as a chef tailors their recipes using diverse ingredients to create delightful dishes, this model has been trained on a plethora of data (2.4 million preference pairs) to serve you the best conversational responses. The chef’s ability to balance flavors resembles the model’s goal to provide responses that are both helpful and harmless.

Key Features

Variety of Sizes Available: Choose from open-sourced reward models of sizes 1.8B, 7B, or 20B, each excelling in performance metrics while facilitating research on scaling laws.
Comprehensive Coverage of Preferences: Trained with diverse datasets, including dialogue, writing, coding, etc., reflecting a balanced understanding of responses.
Multilingual Support: Proficient in both English and Chinese languages, ensuring robust performance in either tongue.

How to Use the Model

Basic Usage Example

Here’s a simple walkthrough on how to utilize the InternLM2-20B-Reward model effectively:

import torch
from transformers import AutoModel, AutoTokenizer

# Load the model and tokenizer
model = AutoModel.from_pretrained(
    'internlm/internlm2-20b-reward',
    device_map='cuda',
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm2-20b-reward', trust_remote_code=True)

# Sample chats
chat_1 = {
    "role": "user", "content": "Hello! What's your name?"
}
chat_2 = {
    "role": "user", "content": "Hello! What's your name?"
}

# Get reward score
score1 = model.get_score(tokenizer, chat_1)
score2 = model.get_score(tokenizer, chat_2)

print(f"score1: {score1}")
print(f"score2: {score2}")

This code initializes the model and compares two chats based on their reward scores. You can manipulate these chats to test different interactions!

Perform Best of N Sampling

To select the best response among multiple candidates, you can implement the Best of N sampling. Here’s how:

import torch
from transformers import AutoModel, AutoTokenizer

# Prepare LLM model and tokenizer
llm = AutoModel.from_pretrained(
    'internlm/internlm2-chat-7b',
    device_map='cuda',
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
llm_tokenizer = AutoTokenizer.from_pretrained('internlm/internlm2-chat-7b', trust_remote_code=True)

# Prepare reward model
reward = AutoModel.from_pretrained(
    'internlm/internlm2-20b-reward',
    device_map='cuda',
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
reward_tokenizer = AutoTokenizer.from_pretrained('internlm/internlm2-20b-reward', trust_remote_code=True)

# Generate best of N candidates
num_candidates = 10  # N=10
candidates = llm.generate(
    prompt,
    max_new_tokens=512,
    num_return_sequences=num_candidates,
    pad_token_id=llm_tokenizer.eos_token_id,
    do_sample=True,
    top_k=50,
    top_p=0.95,
    temperature=0.8,
)

This code snippet helps generate multiple candidates from the language model and ranks them based on their rewards. The candidate with the highest score will be your best response!

Troubleshooting

Sometimes, things don’t operate as expected. Here are some troubleshooting ideas:

Model Not Loading: Ensure your environment meets the necessary requirements and that you’re using the correct model path.
Out of Memory Errors: If training or inference is consuming too much memory, consider reducing the batch size or using a smaller model.
Inconsistent Scores: Make sure the chats follow a similar structure to see reliable comparison results.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With a robust architecture and unique features, the InternLM2-20B-Reward model opens doors to enhanced AI interactions. Utilize it to explore conversations, enhance responses, or even create new samples!

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox