If you are looking to implement a Named Entity Recognition (NER) system in Turkish, you are in the right place! Here’s a user-friendly guide to building an NER model using Python with Bert and Transfer Learning. In this blog post, we will walk you through the steps needed to train an NER model specifically tailored for the Turkish language.
Getting Started with NER
Named Entity Recognition identifies and categorizes key entities in a text. Think of it as a librarian who recognizes and categorizes the books by genre. Just as the librarian sorts books into fiction, non-fiction, science fiction, etc., the NER model sorts words into categories such as names, dates, locations, and more.
Setting Up Your Environment
To begin, you need to ensure you have the necessary datasets. Follow these steps to download your pre-processed datasets:
- Access the terminal and navigate to your designated data folder:
cd tr-data
for file in train.txt dev.txt test.txt labels.txt; do wget https://schweter.eu/storage/turkish-bert-wikiann/$file; done
tr-data folder.Pre-Training the Model
After downloading the dataset, you can start the pre-training process. Set the required environment variables as follows:
- Open your terminal and set the variables:
export MAX_LENGTH=128
export BERT_MODEL=dbmdz/bert-base-turkish-cased
export OUTPUT_DIR=tr-new-model
export BATCH_SIZE=32
export NUM_EPOCHS=3
export SAVE_STEPS=625
export SEED=1
python3 run_ner_old.py --data_dir ./tr-data3 --model_type bert --labels ./tr-data/labels.txt --model_name_or_path $BERT_MODEL --output_dir $OUTPUT_DIR-$SEED --max_seq_length $MAX_LENGTH --num_train_epochs $NUM_EPOCHS --per_gpu_train_batch_size $BATCH_SIZE --save_steps $SAVE_STEPS --seed $SEED --do_train --do_eval --do_predict --fp16
Testing Your NER Model
Once pre-training is done, it’s time to test your model. Here’s how you can use it in a Python script:
- Import the necessary modules:
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
model = AutoModelForTokenClassification.from_pretrained("savasy/bert-base-turkish-ner-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-ner-cased")
ner = pipeline("ner", model=model, tokenizer=tokenizer)
ner("Mustafa Kemal Atatürk 19 Mayıs 1919’da Samsuna ayak bastı.")
Evaluating the Model Performance
After running tests, you can evaluate the results. For example, your evaluation results may look like this:
- Precision = 0.9164
- Recall = 0.9342
- F1 score = 0.9252
- Loss = 0.1134
Troubleshooting
If you encounter any issues while implementing the NER model, consider the following troubleshooting tips:
- Ensure you have all dependencies installed. If a package is missing, use
pip install package_nameto install it. - Verify that your dataset paths are correct and that the files exist in the specified location.
- If you face any memory errors, try reducing the
BATCH_SIZE. - Check the logs for any error messages that could help pinpoint the issue.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
With these steps, you should be able to deploy a basic NER application for the Turkish language with ease. Remember, testing and evaluating your model are crucial for improving its performance.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

