Easy-to-Use NER Application for Turkish Language

Feb 2, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_17_1121

If you are looking to implement a Named Entity Recognition (NER) system in Turkish, you are in the right place! Here’s a user-friendly guide to building an NER model using Python with Bert and Transfer Learning. In this blog post, we will walk you through the steps needed to train an NER model specifically tailored for the Turkish language.

Getting Started with NER

Named Entity Recognition identifies and categorizes key entities in a text. Think of it as a librarian who recognizes and categorizes the books by genre. Just as the librarian sorts books into fiction, non-fiction, science fiction, etc., the NER model sorts words into categories such as names, dates, locations, and more.

Setting Up Your Environment

To begin, you need to ensure you have the necessary datasets. Follow these steps to download your pre-processed datasets:

Access the terminal and navigate to your designated data folder:

cd tr-data

Download the files:

for file in train.txt dev.txt test.txt labels.txt; do wget https://schweter.eu/storage/turkish-bert-wikiann/$file; done

Now, you should have your datasets ready in the tr-data folder.

Pre-Training the Model

After downloading the dataset, you can start the pre-training process. Set the required environment variables as follows:

Open your terminal and set the variables:

export MAX_LENGTH=128
export BERT_MODEL=dbmdz/bert-base-turkish-cased
export OUTPUT_DIR=tr-new-model
export BATCH_SIZE=32
export NUM_EPOCHS=3
export SAVE_STEPS=625
export SEED=1

Then, execute the pre-training command:

python3 run_ner_old.py --data_dir ./tr-data3 --model_type bert --labels ./tr-data/labels.txt --model_name_or_path $BERT_MODEL --output_dir $OUTPUT_DIR-$SEED --max_seq_length $MAX_LENGTH --num_train_epochs $NUM_EPOCHS --per_gpu_train_batch_size $BATCH_SIZE --save_steps $SAVE_STEPS --seed $SEED --do_train --do_eval --do_predict --fp16

Testing Your NER Model

Once pre-training is done, it’s time to test your model. Here’s how you can use it in a Python script:

Import the necessary modules:

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer

Load your model and tokenizer:

model = AutoModelForTokenClassification.from_pretrained("savasy/bert-base-turkish-ner-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-ner-cased")

Create a pipeline to process NER:

ner = pipeline("ner", model=model, tokenizer=tokenizer)

Example usage:

ner("Mustafa Kemal Atatürk 19 Mayıs 1919’da Samsuna ayak bastı.")

Evaluating the Model Performance

After running tests, you can evaluate the results. For example, your evaluation results may look like this:

Precision = 0.9164
Recall = 0.9342
F1 score = 0.9252
Loss = 0.1134

Troubleshooting

If you encounter any issues while implementing the NER model, consider the following troubleshooting tips:

Ensure you have all dependencies installed. If a package is missing, use pip install package_name to install it.
Verify that your dataset paths are correct and that the files exist in the specified location.
If you face any memory errors, try reducing the BATCH_SIZE.
Check the logs for any error messages that could help pinpoint the issue.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With these steps, you should be able to deploy a basic NER application for the Turkish language with ease. Remember, testing and evaluating your model are crucial for improving its performance.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox