In this article, we’ll guide you through setting up and utilizing the Pairwise Reward Model (PairRM) from the LLM-Blender project. Think of PairRM as your culinary guide—just as it helps you prepare a delicious dish, it assists in serving up the finest outputs from your Large Language Models (LLMs) by comparing and ranking them. Let’s dive into the savory world of PairRM!
What is PairRM?
PairRM takes an instruction and a **pair** of output candidates as input. It then outputs a score for each candidate to measure their **relative** quality. This enables you to (re-)rank lists of candidate outputs, assess the quality of LLMs efficiently, and enhance decoding through a method called best-of-n sampling.
Installation
First, you need to install the LLM-Blender package. Here’s how:
pip install git+https://github.com/yuchenlin/LLM-Blender.git
After installing, load PairRM with the following code:
import llm_blender
blender = llm_blender.Blender()
blender.loadranker("llm-blender/PairRM") # load PairRM
Usage
Like a chef tasting from different dishes to find the best flavor, you can use PairRM to compare and rank outputs. Here’s how:
Use Case 1: Comparing and Ranking
To rank multiple candidate responses based on a given instruction:
inputs = ["hello, how are you!", "I love you!"]
candidates_texts = [["get out!", "hi! I am fine, thanks!", "bye!"],
["I love you too!", "I hate you!", "Thanks! You're a good guy!"]]
ranks = blender.rank(inputs, candidates_texts, return_scores=False, batch_size=1)
In this example, the ranks output will represent how each candidate response scored relative to one another—similar to how a judge might score dishes in a cooking competition.
Use Case 2: Best-of-n Sampling
Just like picking the ripest fruit from a selection, Best-of-n Sampling enhances output quality by selecting the top ranked output from a number of choices. Here’s the implementation:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", device_map="auto")
system_message = {"role": "system", "content": "You are a friendly chatbot."}
inputs = ["can you tell me a joke about OpenAI?"]
messages = [[system_message, {"role": "user", "content": _input}] for _input in inputs]
prompts = [tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True) for m in messages]
blender = llm_blender.Blender()
blender.loadranker("llm-blender/PairRM") # load ranker checkpoint
outputs = blender.best_of_n_generate(model, tokenizer, prompts, n=10)
The result will be a more stable and consistent output, like getting a perfect slice of cake each time because you’ve sifted through the bad ones.
Troubleshooting
If you encounter issues during installation or runtime, here are some troubleshooting tips:
- Ensure that you have Python 3.6 or later installed.
- Try reinstalling the package if you face errors during installation.
- Check for typos in your code, especially when loading the PairRM or inputs.
- If your outputs seem off, ensure your candidate lists are correctly formatted and don’t contain inconsistent data types.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

