How to Use the Sentence-Transformers for Sentence Embeddings

Mar 30, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_2_1127

The Sentence-Transformers library brings a powerful way to transform your sentences and paragraphs into numerical embeddings that can be used across a range of applications, such as clustering or semantic search. However, it’s crucial to note that the model discussed here is deprecated and should not be utilized due to its low-quality embeddings. Instead, for your embedding needs, check out the recommended models at SBERT.net – Pretrained Models.

What is Sentence-Transformers?

Sentence-Transformers is a Python library that extends the Hugging Face transformer models, specifically designed for producing semantically meaningful sentence embeddings. The architecture behind it enables effective handling of various linguistic tasks by mapping sentences into a 768-dimensional vector space.

Getting Started with Sentence-Transformers

To start using this library, you’ll first need to install it. Below are the steps to get you walking on your sentence embedding journey:

Installation

Open your command line or terminal.
Run the command: pip install -U sentence-transformers

Using the Deprecated Model

While we advise against using this model due to its deprecation, here’s how it could have been used:

from sentence_transformers import SentenceTransformer

sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/bert-base-nli-stsb-mean-tokens')
embeddings = model.encode(sentences)

print(embeddings)

Understanding the Code Analogy

Think of the code as a team of chefs in a kitchen, preparing a gourmet meal (the sentence embeddings). Each chef represents an individual sentence. The process starts when the first chef presents her dish to the head chef, who is akin to the SentenceTransformer, who sees all the dishes (sentences) as part of a grand banquet. The head chef (model) then harmonizes these dishes, transforming them into a unified gourmet experience (a dense vector representation). Just as chefs need to work efficiently and effectively towards a common goal, the Sentence-Transformers library helps to harmonize sentences into a meaningful numerical representation.

Alternative Usage with HuggingFace Transformers

If you want to sidestep the deprecated models, you can use Hugging face’s transformers library. Below is the implementation:

from transformers import AutoTokenizer, AutoModel
import torch

# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Sentences we want embeddings for
sentences = ["This is an example sentence", "Each sentence is converted"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/bert-base-nli-stsb-mean-tokens')
model = AutoModel.from_pretrained('sentence-transformers/bert-base-nli-stsb-mean-tokens')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

Evaluation and Model Architecture

For assessing the performance of your model, consider utilizing the Sentence Embeddings Benchmark found at https://seb.sbert.net.

The model architecture consists of a transformer followed by a pooling layer to derive embeddings from the output model scores.

Troubleshooting

Problem: Model raises an error during installation.
Solution: Ensure Python and pip are installed and updated. Try reinstalling using pip install --upgrade pip followed by pip install -U sentence-transformers.
Problem: Low-quality embeddings returned.
Solution: Switch to a recommended model from the SBERT.net repository.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By following this guide, you should be able to manipulate sentence embeddings effectively. Remember, while tools and libraries are powerful, understanding the underlying processes is vital for leveraging their capabilities.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox