How to Evaluate Natural Language Generation (NLG) with NLG-Eval

Sep 20, 2020 | Data Science

NLG-Eval is a powerful tool designed to help developers and researchers evaluate the performance of Natural Language Generation systems. It computes various unsupervised metrics to effectively assess generated text compared to reference text. This guide will walk you through the setup process, how to use NLG-Eval, and potential troubleshooting issues you may encounter.

Setting Up NLG-Eval

To get started, you need to prepare your environment for NLG-Eval. Follow these steps:

  • Install Java: Ensure that Java version 1.8.0 or higher is installed on your machine.
  • Install the Python dependencies: Open your terminal and run the following command:
    bash
    pip install git+https://github.com/Maluuban/nlg-eval.git@master
    
  • MacOS Multithreading (if necessary): If you’re using macOS High Sierra or higher, execute:
    bash
    export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
    
  • Initial Setup: Set up the necessary data by running:
    bash
    nlg-eval --setup
    

Customizing Your Setup

If you prefer a custom path for data downloads instead of the default (~/.cache/nlgeval), use the following commands:

bash
nlg-eval --setup $data_path

Validating Your Setup

To ensure your setup has been successful, you can check if the required data files are downloaded. Run:

bash
ls -l ~/.cache/nlgeval

You should see files with appropriate sizes indicating that the setup was successful. You might also want to run tests to validate the installation by using:

bash
pip install pytest
pytest

Using NLG-Eval

After setting up, you can evaluate your NLG metrics either through the command line or Python API.

Command Line Usage

To evaluate using the command line, use the following command format:

bash
nlg-eval --hypothesis=examples/hyp.txt --references=examples/ref1.txt --references=examples/ref2.txt
This compares generated sentences in the hypothesis file with reference sentences.

Using Python API

You can also evaluate using the Python API. Here’s how you can do it:

  • Evaluate the entire corpus:
    python
    from nlgeval import compute_metrics
    metrics_dict = compute_metrics(hypothesis='examples/hyp.txt', references=['examples/ref1.txt', 'examples/ref2.txt'])
    
  • Evaluate a single sentence:
    python
    from nlgeval import compute_individual_metrics
    metrics_dict = compute_individual_metrics(references, hypothesis)
    

Understanding the Code with an Analogy

Think of NLG-Eval like a grading system in a school. It takes the ‘homework’ (hypothesis file) and compares it to the ‘answers of trusted experts’ (reference files). Each student’s (generated sentence) homework is evaluated against known correct answers (ground truths) using various criteria like spelling (BLEU), grammar (METEOR), and coherence (ROUGE). Just like teachers grade students on different aspects, NLG-Eval assigns scores based on different metrics, allowing you to obtain a comprehensive overview of how well your NLG system performs.

Troubleshooting Common Issues

If you encounter problems—or specifically with the METEOR score—try modifying the ‘mem’ variable in meteor.py. Also, consider that CIDEr, when defaulted to corpus mode, computes IDF values based on the reference sentences. For small datasets, such as when referencing just one image, CIDEr might yield inaccurate scores. In these cases, switch it to use the MSCOCO Validation Dataset instead.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Important Notes

If you are running NLG-Eval in a Docker environment or want to share it with other users, you may need to set an environment variable:

bash
NLGEVAL_DATA=~workspace/nlg-eval/nlgeval/data
This will help NLG-Eval locate the necessary data and models.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox