If you’re looking to map sentences or paragraphs into a 768-dimensional dense vector space, then the Sentence-Transformers model, specifically nickprocksentence-BERTino-sts-matryoshka, is your go-to solution. This model can be utilized for various tasks, including clustering and semantic search. In this article, we will walk you through the steps to use this model effectively, highlighting both the Sentence-Transformers method and the HuggingFace Transformers approach.
Getting Started with Sentence-Transformers
First, ensure that you have the sentence-transformers library installed. You can install it using pip with the following command:
pip install -U sentence-transformers
Usage of Sentence-BERTino Model
Now, let’s see how you can utilize this model in your code:
from sentence_transformers import SentenceTransformer
# List of sentences for comparison
sentences = [‘Una ragazza si acconcia i capelli.’, ‘Una ragazza si sta spazzolando i capelli.’]
# Reduce the embedding dimensions for efficiency
matryoshka_dim = 64
# Load the model
model = SentenceTransformer('nickprocksentence-BERTino-sts-matryoshka')
# Encode the sentences
embeddings = model.encode(sentences)
# Shrink the embedding dimensions
embeddings = embeddings[..., :matryoshka_dim]
print(embeddings.shape) # Output: (2, 64)
This code snippet sets up the Sentence-BERTino model, encodes two sentences, and then reduces the output embedding size for efficiency.
Understanding the Code with an Analogy
Imagine you’re a chef preparing a special dish using a set of ingredients (your sentences). The SentenceTransformer works like a high-tech blender that combines these ingredients into a smooth sauce (dense vectors). In this scenario, you have some personal preferences, such as wanting the sauce to be a bit less thick (reducing the embedding dimensions). The output is a fine mixture that retains the essence of the original ingredients and can be used in various recipes (semantic applications).
Using HuggingFace Transformers
If you’d prefer to use HuggingFace Transformers, here’s how to do it:
from transformers import AutoTokenizer, AutoModel
import torch
# Mean Pooling function
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Sentences for embedding
sentences = [‘This is an example sentence’, ‘Each sentence is converted’]
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('nickprocksentence-BERTino-sts-matryoshka')
model = AutoModel.from_pretrained('nickprocksentence-BERTino-sts-matryoshka')
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print('Sentence embeddings:')
print(sentence_embeddings)
This process involves passing sentences through a tokenizer followed by a model to generate embeddings using mean pooling.
Evaluation of the Model
You can evaluate the effectiveness of the model using the Sentence Embeddings Benchmark for automated assessment.
Model Training Details
The training parameters include:
- DataLoader: Batch size of 16, with a total length of 360.
- Loss: Matryoshka Loss with various dimensions and weight assignments.
- Epochs: 10
- Learning Rate: 2e-05
Troubleshooting Common Issues
If you encounter any issues while using the nickprocksentence-BERTino-sts-matryoshka model, here are some troubleshooting tips:
- Ensure all dependencies are properly installed. You can reinstall sentence-transformers if needed.
- Check the input sentences for correct syntax and ensure they are within the model’s length limits.
- If embeddings are not producing expected results, consider reviewing your pooling method or adjusting the model parameters.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

