How to Use spaCy Fishing for Named Entity Recognition

Jun 12, 2024 | Data Science

In the world of natural language processing (NLP), named entity recognition (NER) is like a detective’s trusty magnifying glass, allowing you to identify key characters, places, and other entities within text. This blog post will guide you through using spaCy fishing, an exciting wrapper that integrates the entity-fishing tool with spaCy. You’ll learn how to install, configure, and utilize spaCy fishing to extract valuable information.

Table of Contents

Installation

To get started with spaCy fishing, you first need to install it. You have two options: normal installation and development installation.

Normal Installation

bash
pip install spacyfishing

Development Installation

bash
git clone https://github.com/Lucaterres/spacyfishing.git
virtualenv --python=/usr/bin/python3.8 venv
source venv/bin/activate
pip install -r requirements_dev.txt

Usage (examples)

Let’s dive into how you can use spaCy fishing with some practical examples. Think of this as fishing for meaningful entities in a sea of text!

Simple Example

Python
import spacy
text_en = "Victor Hugo and Honoré de Balzac are French writers who lived in Paris."
nlp_model_en = spacy.load("en_core_web_sm")
nlp_model_en.add_pipe(entityfishing)
doc_en = nlp_model_en(text_en)
for ent in doc_en.ents:
    print((ent.text, ent.label_, ent._.kb_qid, ent._.url_wikidata, ent._.nerd_score))

In this example, we loaded a text containing references to famous French writers and identified various types of entities such as persons, locations, and more.

Batching Example

Imagine you are an ocean fisherman with a net; now let’s catch multiple texts at once!

Python
import spacy
texts_en = [
    "Victor Hugo and Honoré de Balzac are French writers who lived in Paris.",
    "Momofuku Ando is a Taiwanese Japanese Business Magnate that invented instant ramen."
]
nlp_model_en = spacy.load("en_core_web_sm")
nlp_model_en.add_pipe(entityfishing)
docs_en = nlp_model_en.pipe(texts_en, batch_size=128)
for doc_en in docs_en:
    for ent in doc_en.ents:
        print((ent.text, ent.label_, ent._.kb_qid, ent._.url_wikidata, ent._.nerd_score))

Extra Information from Wikidata

Want more than just titles and names? Let’s fetch descriptions and additional identifiers!

Python
import spacy
text_en = "Victor Hugo and Honoré de Balzac are French writers who lived in Paris."
nlp_model_en = spacy.load("en_core_web_sm")
nlp_model_en.add_pipe(entityfishing, config={"extra_info": True})
doc_en = nlp_model_en(text_en)
for ent in doc_en.ents:
    print((ent.text, ent._.description, ent._.normal_term))

Configuration Parameters

  • api_ef_base: URL of the entity-fishing API endpoint.
  • language: Specify the language of KB resources.
  • extra_info: Get more information from Wikidata.
  • filter_statements: Specify which extra info to fetch.
  • verbose: Display logging messages.

Attributes

  • Doc extensions: doc._.annotations and doc._.metadata provide raw API response data.
  • Span extensions: Include various entity attributes such as span._.kb_qid, span._.url_wikidata, etc.

Recommendations

If you’re using the demo server for entity-fishing, keep in mind that there may be query limitations. If you anticipate heavy usage, consider deploying a local instance of the entity-fishing service. For detailed setup instructions, you can follow this link.

Visualise Results

Want to see your findings visually? The manual option of displaCy can help!

Python
import spacy
text_fr = "La bataille d'El-Alamein en Égypte oppose la 8e armée britannique."
nlp_model_fr = spacy.load("fr_core_news_sm")
nlp_model_fr.add_pipe(entityfishing, config={"language": "fr"})
doc_fr = nlp_model_fr(text_fr)
options = {
    "ents": ["MISC", "LOC", "PER"],
    "colors": {"LOC": "#82e0aa", "PER": "#85c1e9", "MISC": "#f0b27a"}
}
params = {
    "text": doc_fr.text,
    "ents": [
        {
            "start": ent.start_char,
            "end": ent.end_char,
            "label": ent.label_,
            "kb_id": ent._.kb_qid,
            "kb_url": ent._.url_wikidata
        } for ent in doc_fr.ents
    ],
    "title": None
}
spacy.displacy.serve(params, style="ent", manual=True, options=options)

External Resources

About

This component is experimental and created by Patrice Lopez from SCIENCE-MINER, with contributions from Inria Paris. The design of its logo was executed by Alix Chagué.

Troubleshooting

If you encounter any issues while setting up or using spaCy fishing, consider the following troubleshooting ideas:

  • Ensure that you have the correct version of Python installed (3.7 or higher).
  • Check installation logs for any errors during the installation process.
  • Reference the official documentation for updates or changes to the API.
  • If running into performance issues, consider optimizing your batch sizes.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox