How to Use the Pairwise Reward Model for LLMs (PairRM)

Jan 26, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_10_32

In this article, we’ll guide you through setting up and utilizing the Pairwise Reward Model (PairRM) from the LLM-Blender project. Think of PairRM as your culinary guide—just as it helps you prepare a delicious dish, it assists in serving up the finest outputs from your Large Language Models (LLMs) by comparing and ranking them. Let’s dive into the savory world of PairRM!

What is PairRM?

PairRM takes an instruction and a **pair** of output candidates as input. It then outputs a score for each candidate to measure their **relative** quality. This enables you to (re-)rank lists of candidate outputs, assess the quality of LLMs efficiently, and enhance decoding through a method called best-of-n sampling.

Installation

First, you need to install the LLM-Blender package. Here’s how:

pip install git+https://github.com/yuchenlin/LLM-Blender.git

After installing, load PairRM with the following code:


import llm_blender
blender = llm_blender.Blender()
blender.loadranker("llm-blender/PairRM") # load PairRM

Usage

Like a chef tasting from different dishes to find the best flavor, you can use PairRM to compare and rank outputs. Here’s how:

Use Case 1: Comparing and Ranking

To rank multiple candidate responses based on a given instruction:


inputs = ["hello, how are you!", "I love you!"]
candidates_texts = [["get out!", "hi! I am fine, thanks!", "bye!"],
                    ["I love you too!", "I hate you!", "Thanks! You're a good guy!"]]
ranks = blender.rank(inputs, candidates_texts, return_scores=False, batch_size=1)

In this example, the ranks output will represent how each candidate response scored relative to one another—similar to how a judge might score dishes in a cooking competition.

Use Case 2: Best-of-n Sampling

Just like picking the ripest fruit from a selection, Best-of-n Sampling enhances output quality by selecting the top ranked output from a number of choices. Here’s the implementation:


from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", device_map="auto")

system_message = {"role": "system", "content": "You are a friendly chatbot."}
inputs = ["can you tell me a joke about OpenAI?"]
messages = [[system_message, {"role": "user", "content": _input}] for _input in inputs]
prompts = [tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True) for m in messages]

blender = llm_blender.Blender()
blender.loadranker("llm-blender/PairRM") # load ranker checkpoint
outputs = blender.best_of_n_generate(model, tokenizer, prompts, n=10)

The result will be a more stable and consistent output, like getting a perfect slice of cake each time because you’ve sifted through the bad ones.

Troubleshooting

If you encounter issues during installation or runtime, here are some troubleshooting tips:

Ensure that you have Python 3.6 or later installed.
Try reinstalling the package if you face errors during installation.
Check for typos in your code, especially when loading the PairRM or inputs.
If your outputs seem off, ensure your candidate lists are correctly formatted and don’t contain inconsistent data types.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox