Are you interested in identifying unassimilated English lexical borrowings in Spanish texts? If so, the anglicisms-spanish-mbert model can help! In this guide, we’ll walk you through the process of using this pretrained model to detect anglicisms in Spanish news articles.
What is the anglicisms-spanish-mbert Model?
The anglicisms-spanish-mbert model is a sophisticated tool designed for detecting unassimilated English lexical borrowings commonly used in Spanish. Examples of such borrowings include fake news, machine learning, smartwatch, and influencer. The model employs a fine-tuned version of multilingual BERT trained on the COALAS corpus to achieve this.
How to Use the Model
Using the anglicisms-spanish-mbert model is straightforward with just a few setup steps. Follow these instructions to get started:
Step 1: Install Required Libraries
You will need to have the transformers library installed. You can install it using pip:
pip install transformers
Step 2: Load the Model
Here’s how to load the model:
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("lirondos/anglicisms-spanish-mbert")
model = AutoModelForTokenClassification.from_pretrained("lirondos/anglicisms-spanish-mbert")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
Step 3: Input Your Text
Now you’re ready to analyze a text for anglicisms! Here’s a quick example:
example = "Buscamos data scientist para proyecto de machine learning."
borrowings = nlp(example)
print(borrowings)
This code will return a list of detected anglicisms in the provided example.
Understanding the Output
The model provides output in the form of entities identified as English lexical borrowings or borrowings from other languages. Each label is crucial for understanding how foreign terms are utilized in your text.
Performance Metrics
The model achieves the following results on the test set from the COALAS corpus:
LABEL Precision Recall F1
ALL 88.09 79.46 83.55
ENG 88.44 82.16 85.19
OTHER 37.5 6.52 11.11
Troubleshooting & Tips
If you encounter any issues while using this model, here are some helpful troubleshooting ideas:
- Library Version: Ensure you’re using the latest version of the transformers library. Outdated versions may lead to compatibility issues.
- Internet Connection: This model requires downloading the model files from Hugging Face. A stable internet connection is necessary.
- Pretrained Models: If you wish to achieve better performance, consider trying out the Flair model which has shown superior results (F1=85.76).
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
Utilizing the anglicisms-spanish-mbert model is an effective way to analyze Spanish texts for anglicisms. By following the steps outlined above, you can identify foreign lexical borrowings, enhancing your understanding of language use in contemporary media.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

