How to Use Snowflake’s Arctic-embed-m-v1.5 in Your Projects

Jul 22, 2024 | Educational

If you’re diving into the world of embedding models for text similarity and retrieval, you might be wondering how to leverage Snowflake’s Arctic-embed-m-v1.5 effectively. This powerful model produces compact embeddings that can still maintain high quality even when compressed to a mere 128 bytes. Get ready for a journey through the setup, usage, and some troubleshooting tips!

Overview of Snowflake’s Arctic-embed-m-v1.5

The Snowflake Arctic-embed-m-v1.5 model is an evolution in the arena of text embeddings. It utilizes a unique combination of Matryoshka Representation Learning (MRL) and efficient scalar quantization techniques to optimize and compress data without significant losses in retrieval performance.

Imagine This!

Think of the Arctic-embed model as a skilled chef who can prepare a gourmet meal (the embeddings) but also package it beautifully in a tiny lunchbox (the compression). Even though the meal is in a smaller container, it still retains its rich flavors and satisfaction. This model ensures that you can serve up high-quality responses while keeping storage to a minimum.

Getting Started with Sentence Transformers

To utilize the Arctic model effectively, you can use the sentence-transformers package. Here’s a simple implementation to get you started:


import torch
from sentence_transformers import SentenceTransformer
from torch.nn.functional import normalize

# Model constants
MODEL_ID = "Snowflake/snowflake-arctic-embed-m-v1.5"

# Your queries and docs
queries = ['What is Snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City, of course!']

# Load the model
model = SentenceTransformer(MODEL_ID, model_kwargs=dict(add_pooling_layer=False))

# Generate text embeddings
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

# Scores via dot product
scores = query_embeddings @ document_embeddings.T

# Pretty-print the results
for query, query_scores in zip(queries, scores):
    doc_score_pairs = sorted(zip(documents, query_scores), key=lambda x: x[1], reverse=True)
    print(f'Query: "{query}"')
    for document, score in doc_score_pairs:
        print(f'Score: {score:.4f} | Document: "{document}"')
    print()

Understanding the Code with an Analogy

Imagine you’re a librarian organizing books based on different themes. Here’s how the script works in a similar fashion:

1. Model Loading: Just as you gather all your classification tools, you load the Arctic model to facilitate your retrieval tasks.
2. Queries and Documents: The queries are like the questions patrons ask, while documents represent the various books on your shelves.
3. Embedding Generation: You convert queries and documents into meaningful “codes” or embeddings, akin to tagging each book according to its genre.
4. Scoring: You then find how closely each query matches with the books (documents) – the scoring is akin to ranking how well a book answers a certain query.

Example Output

The output from running the above code will look something like this:


Query: "What is Snowflake?"
Score: 0.3521 | Document: "The Data Cloud!"
Score: 0.2358 | Document: "Mexico City, of course!"

Query: "Where can I get the best tacos?"
Score: 0.3884 | Document: "Mexico City, of course!"
Score: 0.2389 | Document: "The Data Cloud!"

Using Huggingface Transformers

Should you prefer using the Huggingface transformers library, you can follow this method:


import torch
from transformers import AutoModel, AutoTokenizer

# Model constants
MODEL_ID = "Snowflake/snowflake-arctic-embed-m-v1.5"
QUERY_PREFIX = 'Represent this sentence for searching relevant passages: '

# Your queries and docs
queries = ['What is Snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City, of course!']

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModel.from_pretrained(MODEL_ID, add_pooling_layer=False)

# Add query prefix and tokenize
queries_with_prefix = [f"{QUERY_PREFIX}{q}" for q in queries]
query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt')

# Generate embeddings
with torch.inference_mode():
    query_embeddings = model(query_tokens)[0][:, 0]
    document_tokens = tokenizer(documents, padding=True, truncation=True, return_tensors='pt')
    document_embeddings = model(document_tokens)[0][:, 0]

# Normalize embeddings and compute similarity scores
query_embeddings = normalize(query_embeddings)
document_embeddings = normalize(document_embeddings)
scores = query_embeddings @ document_embeddings.T

# Pretty-print the results as before

Troubleshooting Tips

1. Model Not Loading: Ensure you’re using the correct model ID and check your internet connection since the model is loaded from a remote source.
2. Dimensionality Errors: When using truncated embeddings, ensure that your model configuration supports the dimensions you choose.
3. Score Output: If scores aren’t as expected, double-check your document embeddings and ensure normalization is correctly applied.

For more troubleshooting questions/issues, contact our fxis.ai data scientist expert team.

Conclusion

Snowflake’s Arctic-embed-m-v1.5 provides an innovative avenue in text embedding, equipping you with tools to optimize embedding generation and retrieval. Whether using Sentence Transformers or the Huggingface library, the integration should enhance model performance significantly. Happy coding!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

How to Use Snowflake’s Arctic-embed-m-v1.5 in Your Projects

Let’s Build Success Together