How to Use the Sentence-Transformers Library for Sentence Similarity

Mar 30, 2024 | Educational

The sentence-transformers library is a powerful tool that can convert sentences and paragraphs into high-dimensional vector representations, which can be incredibly useful for tasks like semantic search or clustering. However, it’s essential to note that the model we are discussing in this blog post is deprecated due to the low quality of sentence embeddings it produces. Therefore, it is recommended to explore other models as outlined at **SBERT.net – Pretrained Models**.

Getting Started

To make use of the sentence-transformers library, you need to have it installed. If you’re ready, follow the steps below.

Installation

  • Open your command line interface.
  • Run the following command to install the library:
pip install -U sentence-transformers

Basic Usage with Sentence-Transformers

Once you’ve installed the library, using it is quite simple. Below is a brief guide on how to utilize the deprecated model:

from sentence_transformers import SentenceTransformer

sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/nli-bert-base-cls-pooling')
embeddings = model.encode(sentences)
print(embeddings)

In the code above:

  • We import the SentenceTransformer class.
  • We provide a list of sentences to be transformed.
  • Then, we load the model and encode the sentences, which gives us numerical embeddings.

Using HuggingFace Transformers Without Sentence-Transformers

If you prefer not to use the sentence-transformers library, you can leverage HuggingFace Transformers instead. Here’s how:

from transformers import AutoTokenizer, AutoModel
import torch

def cls_pooling(model_output, attention_mask):
    return model_output[0][:,0]

sentences = ["This is an example sentence", "Each sentence is converted"]
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/nli-bert-base-cls-pooling')
model = AutoModel.from_pretrained('sentence-transformers/nli-bert-base-cls-pooling')

encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

with torch.no_grad():
    model_output = model(**encoded_input)

sentence_embeddings = cls_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)

Think of the models as chefs and the sentences as ingredients. In the sentence-transformers method, you simply ask the chef to prepare the dish using an all-in-one recipe, while in the HuggingFace method, you gather your ingredients first, then pass them to the chef who meticulously prepares each element before combining them into a final dish. Both methods can get you to the same goal: delicious embeddings.

Evaluation Results

For automated evaluation of the deprecated model, you can check the Sentence Embeddings Benchmark at seb.sbert.net.

Troubleshooting

If you encounter issues, here are a few troubleshooting tips:

  • Ensure you have installed the correct library version. You can reinstall it using the installation command mentioned above.
  • Check if your Python environment is set up correctly and is compatible with the library.
  • If any error messages appear, try searching for those specific errors online or check the documentation for potential solutions.

For more insights, updates, or to collaborate on AI development projects, stay connected with **fxis.ai**.

Conclusion

While the model we discussed is now deprecated, understanding how to use libraries like sentence-transformers and HuggingFace Transformers for comparative tasks can set the foundation for experimenting with more robust solutions available today. At **fxis.ai**, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox