How to Use the Reranker Models from MixedBread AI

Jul 24, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_4_189

The Reranker models from MixedBread AI offer a powerful way to reorder a set of documents based on their relevance to a specific query. This guide will provide a concise step-by-step process to get started, whether you’re using Python or JavaScript, and will also help you troubleshoot common issues.

Quickstart Guide: Python Implementation

To use the Reranker models in Python, ensure you have the sentence-transformers library installed. If you haven’t installed it yet, you can do so with the following command:

pip install -U sentence-transformers

Once you’re set up, you can rerank documents using just a few lines of code, akin to setting up a quick game of cards where you shuffle them for a better hand!

from sentence_transformers import CrossEncoder

# Load the model
model = CrossEncoder("mixedbread-aimxbai-rerank-xsmall-v1")

# Example query and documents
query = "Who wrote To Kill a Mockingbird?"
documents = [    
    "To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    "The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
    "Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
    "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
    "The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
    "The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]

# Lets get the scores
results = model.rank(query, documents, return_documents=True, top_k=3)
print(results)

In this code block, we load a model, input a query and a selection of documents, and receive scores that determine the relevance ranked from highest to lowest.

Quickstart Guide: JavaScript Implementation

For those who prefer JavaScript, you can implement the Reranker models similarly. Ensure you have transformers.js installed:

npm i @xenovatransformers

Next, set up your ranking function:

import AutoTokenizer, AutoModelForSequenceClassification from '@xenovatransformers';

const model_id = "mixedbread-aimxbai-rerank-xsmall-v1";
const model = await AutoModelForSequenceClassification.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

async function rank(query, documents, top_k = undefined, return_documents = false) {
    const inputs = tokenizer(new Array(documents.length).fill(query), {
        text_pair: documents,
        padding: true,
        truncation: true,
    });

    const logits = await model(inputs);
    return logits
        .sigmoid()
        .tolist()
        .map(([score], i) => ({
            corpus_id: i,
            score,
            ...(return_documents ? { text: documents[i] } : {}),
        }))
        .sort((a, b) => b.score - a.score)
        .slice(0, top_k);
}

// Example usage
const query = "Who wrote To Kill a Mockingbird?";
const documents = [ ... ]; // same as above
const results = await rank(query, documents, { return_documents: true, top_k: 3 });
console.log(results);

Here, we import the necessary classes, load the model, and implement a function that scores the documents similarly to the Python example.

Troubleshooting Common Issues

If you run into any issues, here are some quick troubleshooting tips:

Ensure that the model IDs are correctly referenced and the models are appropriately installed.
Check if all required libraries are updated to their latest versions.
Verify that your queries and documents are formatted correctly.
For API access, ensure that your API key is correctly set up.

If you continue experiencing issues, consider reaching out for assistance. For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Model Evaluation

Reranker models enhance search capabilities and work great alongside keyword search. Evaluation against various datasets shows promising performance, as summarized below:

Model	NDCG@10	Accuracy@3
Lexical Search (Lucene)	38.0	66.4
BAAIbge-reranker-base	41.6	66.9
BAAIbge-reranker-large	45.2	70.6
cohere-embed-v3 (semantic search)	47.5	70.9
mxbai-rerank-xsmall-v1	43.9	70.0
mxbai-rerank-base-v1	46.9	72.3
mxbai-rerank-large-v1	48.8	74.9

In summary, these models have effective results, especially for various types of queries and document collections. It’s like having a skilled librarian who not only finds the books you want but also knows which ones are the most relevant for your needs.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

How to Use the Reranker Models from MixedBread AI

Quickstart Guide: Python Implementation

Quickstart Guide: JavaScript Implementation

Troubleshooting Common Issues

Model Evaluation

Let’s Build Success Together