In the world of natural language processing, understanding the emotional tone behind words is as crucial as understanding the words themselves. Enter HeBERT—a Hebrew pre-trained language model designed for polarity analysis and emotion recognition. This blog will guide you through the intricacies of using HeBERT, complete with troubleshooting tips.
What is HeBERT?
HeBERT is a specialized language model based on Google’s BERT architecture, specifically tailored for the Hebrew language. Trained on vast datasets, including a Hebrew version of OSCAR and Wikipedia, it offers robust capabilities for analyzing sentiments and emotions in texts, making it a valuable tool for anyone looking to perform fine-tuned analysis on Hebrew text.
How to Get Started with HeBERT
Using HeBERT is straightforward. Below are two primary use cases:
- For masked language modeling (can be fine-tuned to any downstream task).
- For sentiment classification focusing solely on polarity.
1. Using HeBERT for Masked Language Modeling
Suppose you’re a chef and want to prepare a new dish but have forgotten a key ingredient. Just as you’d fill in the gaps with educated guesses, masked language modeling functions similarly. It predicts missing words (ingredients) in a sentence (recipe).
Here’s how you can implement this:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT")
model = AutoModel.from_pretrained("avichr/heBERT")
from transformers import pipeline
fill_mask = pipeline(
"fill-mask",
model="avichr/heBERT",
tokenizer="avichr/heBERT"
)
fill_mask("הקורונה לקחה את [MASK] ולנו לא נשאר דבר.")
2. Using HeBERT for Sentiment Classification
Imagine a customer entering a café and expressing feelings about their drink on social media. Depending on their words, you can categorize the sentiment as positive, negative, or neutral, akin to how HeBERT learns to classify sentiment in Hebrew texts.
Here’s how to implement sentiment analysis:
from transformers import AutoTokenizer, AutoModel, pipeline
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis")
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
# How to use?
sentiment_analysis = pipeline(
"sentiment-analysis",
model="avichr/heBERT_sentiment_analysis",
tokenizer="avichr/heBERT_sentiment_analysis",
return_all_scores = True
)
# Example texts
print(sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים'))
print(sentiment_analysis('קפה זה טעים'))
print(sentiment_analysis('אני לא אוהב את העולם'))
Performance Indicators
HeBERT has shown impressive performance metrics:
- Positive sentiment precision: 0.96
- Negative sentiment recall: 0.99
- Overall accuracy: 0.97
Where to Find More Information
Our model is also available on AWS! For more details, check out AWS’ git.
Troubleshooting Tips
If you encounter any issues, here are a few troubleshooting ideas:
- Ensure that you have the correct version of the
transformerslibrary installed. - Check your internet connection; a stable connection is essential for downloading models.
- If you receive an unexpected output, verify your input sentences for any typos.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Stay tuned for more updates, especially regarding emotion detection, which will be released in future versions of HeBERT.

