How to Use Sentence Transformers for Sentence Embeddings

Mar 27, 2024 | Educational

In the world of Natural Language Processing (NLP), transforming sentences into meaningful numerical representations is a vital task. The ‘sentence-transformers’ library allows us to convert sentences into vectors, making it easier to perform tasks like clustering and semantic search. However, it’s important to note that certain models may be deprecated and yield low-quality embeddings. Let’s explore how to use sentence transformers while navigating potential pitfalls.

Understanding Sentence Transformers

The sentence-transformers library provides a model called stsb-bert-large to transform sentences and paragraphs into a 1024-dimensional dense vector space. This is akin to taking words and mapping them to a treasure map where each location corresponds to a unique meaning or semantic relationship.

Installation and Setup

To use this model effectively, you’ll first need to install the sentence-transformers library. Open your command line interface (CLI) and type:

pip install -U sentence-transformers

Using the Model with Sentence-Transformers

Once you’ve set up your environment, you can begin using the model. Here’s how you can achieve this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/stsb-bert-large')
embeddings = model.encode(sentences)
print(embeddings)

This snippet imports the model and provides it with sentences to convert. Think of it like asking the model to translate sentences into a secret language, isolating their essence into vectors.

Using the Model without Sentence-Transformers

If you prefer using the HuggingFace Transformers library instead, follow these steps:

from transformers import AutoTokenizer, AutoModel
import torch

# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9) 

# Sentences we want sentence embeddings for
sentences = ["This is an example sentence", "Each sentence is converted"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/stsb-bert-large')
model = AutoModel.from_pretrained('sentence-transformers/stsb-bert-large')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)

This code illustrates the process of handling sentences without the ‘sentence-transformers’ package. It uses a pooling strategy to generate high-quality sentence embeddings from the model’s outputs. Imagine pooling as gathering insights from a vibrant discussion at a café – the most meaningful insights are extracted and put together.

Troubleshooting Common Issues

If you encounter issues while implementing these methods, here are some troubleshooting tips:

Model Deprecated: Ensure you do not rely on deprecated models, as they provide low-quality embeddings. Refer to SBERT.net – Pretrained Models for alternatives.
Installation Problems: If your installation fails, confirm that you have the latest version of Python and pip installed.
Code Errors: Check for syntax errors or ensure necessary libraries are imported correctly.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Understanding and utilizing sentence transformers can significantly enhance your NLP projects. By converting sentences to meaningful vector representations, you can embark on diverse tasks seamlessly. Always stay updated with the latest models and methodologies to ensure the best performance.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox