How to Use Sentence-Transformers for Sentence Similarity Tasks

Jun 29, 2021 | Educational

If you’ve ever needed to measure the similarity between sentences or paragraphs, you’ve stumbled upon the right place! In this guide, we’ll walk through the process of utilizing the sentence-transformers library to map your text into a dense vector space, allowing for powerful operations such as clustering and semantic search.

Understanding Sentence-Transformers

The paraphrase-xlm-r-multilingual-v1 model provided by sentence-transformers efficiently converts sentences into 768-dimensional embeddings. Think of it like turning each sentence into a unique fingerprint – every sentence will have a distinct representation in this vector space, making it easier to identify similar sentences.

from sentence_transformers import SentenceTransformer

# List of sentences
sentences = ["This is an example sentence", "Each sentence is converted"]

# Load Model
model = SentenceTransformer('sentence-transformers/paraphrase-xlm-r-multilingual-v1')

# Create embeddings
embeddings = model.encode(sentences)
print(embeddings)

Getting Started with Installation

Before we jump into the code, ensure you have the sentence-transformers library installed. You can easily do this with pip!

pip install -U sentence-transformers

Using Sentence-Transformers Model

Once you’ve installed the library, you can start using the model to encode your sentences as shown above. Just replace the example sentences with your own, and you’ll get their embeddings.

Alternative Usage: HuggingFace Transformers

If you choose not to use the sentence-transformers library directly, you can still achieve the same results using HuggingFace’s Transformers library. The general workflow remains the same and involves a few more steps.

The Process Explained: An Analogy

Imagine you have a group of friends, each with a unique set of interests. When you describe a new activity (your sentence), they express their interest levels in various ways (embeddings). Some are keen (high similarity), while others are indifferent (low similarity). The embeddings generated capture these layers of interest, allowing you to find out which friends might enjoy the new activity the most.

Code for HuggingFace Transformers

from transformers import AutoTokenizer, AutoModel
import torch

# Mean Pooling Function
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Example sentences
sentences = ["This is an example sentence", "Each sentence is converted"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/paraphrase-xlm-r-multilingual-v1')
model = AutoModel.from_pretrained('sentence-transformers/paraphrase-xlm-r-multilingual-v1')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)

Evaluating Your Model

To assess the performance of your model, you can refer to the Sentence Embeddings Benchmark. This provides an automated evaluation of various models including the one discussed here.

Troubleshooting Tips

  • If you receive an error related to missing libraries, ensure you have installed the required packages using pip.
  • Make sure you’re using the correct model name when loading from HuggingFace.
  • If your embeddings aren’t appearing as expected, double-check your input sentences for correctness.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox