How to Utilize Sentence Transformers for Sentence Similarity

Mar 4, 2023 | Educational

Harnessing the power of sentence-transformers allows developers to map sentences and paragraphs into a dense vector space, paving the way for advanced tasks such as clustering and semantic search. In this guide, we will walk you through the usage of a pre-trained sentence-transformers model and explore its capabilities through practical examples.

Understanding the Model: The Vector Universe

Think of the sentence-transformers model as a talented artist who takes phrases and sentences and transforms them into unique visual representations in a 768-dimensional space—much like a vibrant canvas filled with rich colors and depth. This enables the sentences to maintain semantic similarity while differing in language, in essence, unifying them in a shared universe of meaning.

Getting Started with Sentence Transformers

Before you jump into utilizing sentence-transformers, ensure that you have the library installed. You can do this using pip:

  • Install Sentence Transformers: Run the following command in your terminal:
pip install -U sentence-transformers

Usage of Sentence Transformers

Here’s how you can encode sentences using the library:

python
from sentence_transformers import SentenceTransformer

sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer(MODEL_NAME)
embeddings = model.encode(sentences)
print(embeddings)

Using HuggingFace Transformers

If you prefer using HuggingFace Transformers without the sentence-transformers library, here’s how you can achieve similar results:

python
from transformers import AutoTokenizer, AutoModel
import torch

# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Sentences we want sentence embeddings for
sentences = ["This is an example sentence", "Each sentence is converted"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModel.from_pretrained(MODEL_NAME)

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)

Evaluation of Model

The performance of your model can be evaluated against the Sentence Embeddings Benchmark.

Training Insights

The model was developed using a variety of parameters, ensuring efficiency in learning. Most notably:

  • Data Loader: PubmedLowMemoryLoader with 26,041 entries.
  • Loss Function: MultipleNegativesRankingLoss with a scale of 20.0.
  • Optimizer: AdamW with learning rate settings.

Troubleshooting

Encountering issues while implementing this model? Here are some common troubleshooting steps:

  • If you face errors in installation, check your Python package version compatibility.
  • Verify that you are using the correct MODEL_NAME in your code.
  • If the model output is unexpected, ensure your input sentences are well-structured.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox