How to Use English NER in Flair with a Large Model

May 8, 2021 | Educational

Named Entity Recognition (NER) is a pivotal task in the field of Natural Language Processing (NLP). In this guide, we will explore how to employ a large NER model for English using the Flair library. This model stands out with an impressive F1-Score of 94.36, and can identify four categories of entities: person names (PER), location names (LOC), organization names (ORG), and miscellaneous names (MISC).

Understanding Flair’s Large NER Model

The Flair large NER model utilizes document-level XLM-R embeddings and the FLERT method to accurately predict entities in text. Imagine the model as a savvy librarian who can swiftly pinpoint not just the names of people and places in a book but also categorize them accordingly.

Available Tags

PER: Person name
LOC: Location name
ORG: Organization name
MISC: Other names

Getting Started with Flair NER

First, you’ll need to install the Flair library. You can do this easily with pip:

pip install flair

Step-by-Step Guide to Use the NER Model

Follow these steps to utilize the large English NER model:

Import necessary classes from Flair.
Load the NER tagger model.
Create a sentence for analysis.
Predict NER tags on the sentence.
Print out the sentence and the discovered NER spans.

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/ner-english-large")

# make example sentence
sentence = Sentence("George Washington went to Washington")

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')

# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

Sample Output

When you run the above code, you will get an output similar to:

Span [1,2]: "George Washington"   [− Labels: PER (1.0)]
Span [5]: "Washington"   [− Labels: LOC (1.0)]

Training Your Own NER Model

If you’re interested in training your own NER model, here’s a quick overview of the steps involved:

Load the CONLL-03 corpus.
Define the tag you wish to predict.
Create a tag dictionary from the corpus.
Initialize transformer embeddings with fine-tuning options.
Set up a sequence tagger model.
Use an optimizer and train the model on your data.

import torch
# get the corpus
from flair.datasets import CONLL_03
corpus = CONLL_03()

# tag type
tag_type = 'ner'
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# initialize embeddings
from flair.embeddings import TransformerWordEmbeddings
embeddings = TransformerWordEmbeddings(model='xlm-roberta-large', layers="-1", fine_tune=True)

# setup tagger
from flair.models import SequenceTagger
tagger = SequenceTagger(hidden_size=256, embeddings=embeddings, tag_dictionary=tag_dictionary, tag_type='ner')

# train the model
from flair.trainers import ModelTrainer
trainer = ModelTrainer(tagger, corpus, optimizer=torch.optim.AdamW)

trainer.train('resources/taggers/ner-english-large')

Troubleshooting

If you encounter any issues while working with Flair, here are a few troubleshooting tips to consider:

Ensure that you have the latest version of Flair installed.
Check if all required dependencies are installed correctly.
Look at the Flair issue tracker for similar problems.
For assistance, you can refer to the documentation available here.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox