How to Use Monarch Mixer-BERT for Long-Context Retrieval

Jan 14, 2024 | Educational

If you’re diving into the world of long-context retrieval, the Monarch Mixer-BERT (M2-BERT) is an innovative model that may just become one of your go-to tools. With its 80M parameter checkpoint and a maximum sequence length of 32,768, M2-BERT is designed to handle substantial amounts of input data. This guide will walk you through the setup and usage of M2-BERT, ensuring you get the most out of this robust model.

Getting Started with Monarch Mixer-BERT

First things first: to take advantage of this model, you’ll need the appropriate libraries. The model can be loaded using Hugging Face’s AutoModel class.

Installation Requirements

Python environment set up.
Transformers library by Hugging Face installed.

Loading the Model

To load the M2-BERT model, you can use the following Python code:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "togethercomputerm2-bert-80M-32k-retrieval",
    trust_remote_code=True
)

When you load the model, you may encounter a large error message concerning unused parameters for FlashFFTConv. If you wish to utilize this feature, check the comprehensive guide available on our GitHub page.

Generating Embeddings

This model generates embeddings useful for retrieval tasks, with a dimensionality of 768. Here’s how to generate these embeddings:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

max_seq_length = 32768
testing_string = "Every morning, I make a cup of coffee to start my day."

model = AutoModelForSequenceClassification.from_pretrained(
    "togethercomputerm2-bert-80M-32k-retrieval",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "bert-base-uncased",
    model_max_length=max_seq_length
)

input_ids = tokenizer(
    [testing_string],
    return_tensors='pt',
    padding='max_length',
    return_token_type_ids=False,
    truncation=True,
    max_length=max_seq_length
)

outputs = model(**input_ids)
embeddings = outputs['sentence_embedding']

Understanding the Code: An Analogy

Imagine you’re a chef preparing a large banquet. The ingredients (your input text) need to be neatly packaged (tokenized) to fit into your large dish (model). Using tools like your trusty knife set (the tokenizer) ensures each ingredient is just right, allowing you to create a delicious meal (embeddings) that all your guests will enjoy.

Using the Together API for Embeddings

If you prefer obtaining embeddings via the Together API, here’s a concise function for it:

import os
import requests

def generate_together_embeddings(text: str, model_api_string: str, api_key: str):
    url = "https://api.together.xyz/api/v1/embeddings"
    headers = {
        "accept": "application/json",
        "content-type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    
    session = requests.Session()
    response = session.post(
        url,
        headers=headers,
        json={
            "input": text,
            "model": model_api_string
        }
    )
    
    if response.status_code != 200:
        raise ValueError(f"Request failed with status code {response.status_code}: {response.text}")
    
    return response.json()['data'][0]['embedding']

print(generate_together_embeddings(
    "Hello world",
    "togethercomputerm2-bert-80M-32k-retrieval",
    os.environ['TOGETHER_API_KEY']
)[:10])

Troubleshooting Common Issues

If you encounter issues related to unrecognized parameter warnings, it’s often due to configuration differences in model loading. Consult the GitHub for detailed configurations.
For connection problems with the Together API, verify your internet connection and ensure that your API key is valid (you can find your API key here).
If the embeddings are not returning the expected results, consider adjusting your input text or verifying that you’re using the right model string.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Acknowledgments

Alycia Lee provided assistance with AutoModel support, immensely contributing to this project.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox