How to Use the Sentence-Transformers Library for Semantic Search

Mar 30, 2024 | Educational

The sentence-transformers library is a powerful tool for converting sentences and paragraphs into dense vector representations. This allows for various natural language processing tasks, such as clustering and semantic search. In this article, we’ll guide you through installing and using the distilbert-multilingual-nli-stsb-quora-ranking model effectively, whether you choose to utilize the sentence-transformers library directly or through HuggingFace Transformers.

Getting Started

First, ensure you have the required library installed. You can do this easily via pip:

pip install -U sentence-transformers

Using the Sentence-Transformers Library

Once you have the library installed, you can utilize the model as shown below:

from sentence_transformers import SentenceTransformer

# List of sentences to be encoded
sentences = ["This is an example sentence.", "Each sentence is converted."]

# Load the model
model = SentenceTransformer('sentence-transformers/distilbert-multilingual-nli-stsb-quora-ranking')

# Encode the sentences
embeddings = model.encode(sentences)

# Output the embeddings
print(embeddings)

Using HuggingFace Transformers

If you prefer not to use the sentence-transformers library, the model is also accessible via HuggingFace Transformers. Here’s how to do it:

from transformers import AutoTokenizer, AutoModel
import torch

# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Sentences we want sentence embeddings for
sentences = ["This is an example sentence.", "Each sentence is converted."]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/distilbert-multilingual-nli-stsb-quora-ranking')
model = AutoModel.from_pretrained('sentence-transformers/distilbert-multilingual-nli-stsb-quora-ranking')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

# Output the sentence embeddings
print("Sentence embeddings:")
print(sentence_embeddings)

Understanding the Code – An Analogy

Think of the process of converting sentences into embeddings as a creative writing workshop. In this workshop, each sentence is a participant seeking to express itself. The sentence-transformers library acts as the mentor guiding these participants. Just like the mentor observes each participant’s presentation (the sentences), the model captures their essence and transforms it into a summarized version (the embeddings). The pooling technique, similar to gathering feedback from the mentor, ensures that the best features of each participant are preserved, allowing the resulting output to be both accurate and meaningful.

Troubleshooting

If you encounter issues while using the library or the model, here are a few troubleshooting tips:

Make sure you have the latest version of the sentence-transformers library installed.
Check for any typos in the model name when loading or encoding sentences.
If you experience runtime errors, ensure that your environment has sufficient resources, such as CPU/GPU availability.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Evaluation Results

To assess the effectiveness of the model, consider checking out the Sentence Embeddings Benchmark, where you can find automated evaluation results.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox