The Paraphrase Filipino MPNet Base V2 model is a powerful tool for mapping sentences and paragraphs into a 768-dimensional dense vector space, useful for clustering and semantic search. In this article, we’ll guide you through the process of setting up and using this model effectively. Buckle up, and let’s dive into the realm of sentence similarity!
Step 1: Installation of Libraries
To begin using the Paraphrase Filipino MPNet Base V2 model, you will need to first install the required libraries. Make sure you have sentence-transformers installed in your Python environment:
pip install -U sentence-transformers
Step 2: Using the Model
Let’s leverage the model. You can utilize the model through two methods. The first one requires the sentence-transformers library, while the second goes through the HuggingFace Transformers library.
Using Sentence-Transformers
Here’s how you can do it:
from sentence_transformers import SentenceTransformer
from scipy.spatial import distance
import itertools
model = SentenceTransformer('meedan/paraphrase-filipino-mpnet-base-v2')
sentences = [
"saan pong mga lugar available ang pfizer vaccine? Thank you!",
"Ask ko lang po saan meron available na vaccine",
"Where is the vaccine available?"
]
embeddings = model.encode(sentences)
dist = [distance.cosine(i, j) for i, j in itertools.combinations(embeddings, 2)]
print(dist)
Using HuggingFace Transformers
Without the sentence-transformers, here’s how to employ the model:
from transformers import AutoTokenizer, AutoModel
import torch
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ["This is an example sentence", "Each sentence is converted"]
tokenizer = AutoTokenizer.from_pretrained('MODEL_NAME')
model = AutoModel.from_pretrained('MODEL_NAME')
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
Step 3: Evaluation Results
Once you’ve obtained the embeddings, you can evaluate the model using translation data. In this case, we used Google Translation API to evaluate against original English STS data.
Understanding the Process
Imagine your sentences are like flavors in an ice cream sundae. Each flavor has a unique taste (or vector), contributing to the overall sundae. The model helps identify how similar these flavors are, essentially measuring the distances between them in our 768-dimensional space. The closer the flavors (or vectors), the more alike they are!
Troubleshooting Tips
Should you encounter any hiccups along the way, here are a few troubleshooting ideas:
- Ensure your Python environment has the correct version of sentence-transformers installed.
- Double-check your installed libraries; conflicts may arise with different versions of HuggingFace Transformers.
- If you receive any unexpected errors, revisiting installation steps or consulting the documentation can help.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

