How to Use InvoiceNet for Intelligent Information Extraction

Mar 7, 2021 | Data Science

InvoiceNet is a revolutionary tool meticulously designed to extract valuable information from invoice documents using deep neural networks. Whether you have PDF, JPG, or PNG invoices, InvoiceNet is equipped to handle them all with ease, providing a user-friendly interface for easy data extraction and handling. Let’s embark on a journey to harness the power of InvoiceNet.

Installation

Before diving into the extraction of intelligent information, you’ll need to install InvoiceNet on your system, depending on your operating system.

Installing on Ubuntu 20.04

  • First, open your terminal and run the following commands:
  • git clone https://github.com/naiveHobo/InvoiceNet.git
    cd InvoiceNet
    # Run installation script
    ./install.sh
  • This script takes care of installing all necessary dependencies, creating a virtual environment, and finally installing InvoiceNet.
  • To use InvoiceNet, activate the virtual environment:
  • source env/bin/activate

Installing on Windows 10

  • For Windows users, it’s best to use Anaconda. Open your command prompt and execute:
  • git clone https://github.com/naiveHobo/InvoiceNet.git
    cd InvoiceNet
    # Create conda environment and activate
    conda create --name invoicenet python=3.7
    conda activate invoicenet
    # Install InvoiceNet
    pip install .
    # Install poppler
    conda install -c conda-forge poppler
  • Ensure to also install additional dependencies like:

Data Preparation

To train your custom models effectively, prepare your data as follows:

  • Arrange your invoice files and their corresponding JSON label files in a single directory in this format:
  • train_data/
        invoice1.pdf
        invoice1.json
        nike-invoice.pdf
        nike-invoice.json
        12345.pdf
        12345.json
  • Ensure that each JSON file follows this format: vendor_name:Nike, invoice_date:12-01-2017, invoice_number:R0007546449, total_amount:137.51 (and other fields as necessary).
  • Use the GUI or command line interface to prepare your data.

Custom Fields Additions

InvoiceNet supports adding custom fields to match your specific requirements:

  • Edit the invoicenet/__init__.py file to define your fields.
  • There are four predefined field types: general, optional, amount, and date. Here’s how you would add a field:
  • # Add the following line at the end of the file
    FIELDS[total_amount] = FIELD_TYPES[amount]
    FIELDS[invoice_date] = FIELD_TYPES[date]
    FIELDS[tax_id] = FIELD_TYPES[optional]
    FIELDS[vendor_name] = FIELD_TYPES[general]

Using the GUI for Training and Extraction

InvoiceNet features a user-friendly GUI for training models and extracting information:

  • To run the trainer GUI, execute:
  • python trainer.py
  • For the extractor GUI, run:
  • python extractor.py
  • Make sure to prepare your data before training by clicking the **Prepare Data** button and selecting your data folder.

Using the CLI for Training and Prediction

For those who prefer command-line operations:

Training Your Model

  • Prepare your data:
  • python prepare_data.py --data_dir train_data
  • Train your InvoiceNet model:
  • python train.py --field enter-field-here --batch_size 8

Prediction

To extract fields from invoices using your trained model:

  • For a single invoice:
  • python predict.py --field enter-field-here --invoice path-to-invoice-file
  • For multiple invoices, place them in one directory and run:
  • python predict.py --field enter-field-here --data_dir predict_data

Troubleshooting

If you encounter any issues during installation or usage, consider the following troubleshooting tips:

  • Ensure all dependencies are installed correctly.
  • Check the compatibility of your operating system with the required versions (CUDA, cuDNN, TensorFlow).
  • If you’re unable to prepare your data or train your model, revisit the data preparation steps.
  • For specific issues, consult the community or reach out via email if you have a dataset to share: sarthakmittal2608@gmail.com.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

InvoiceNet provides a powerful solution for intelligent extraction from invoices, accommodating both custom field additions and diverse input formats effortlessly. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox