Project Insight

Apr 4, 2024 | Data Science

NLP as a Service

Project Insight

GitHub issues GitHub forks Github Stars GitHub license Code style: black

Contents

Introduction

Project Insight is designed to create NLP as a service with a code base for both front end GUI (streamlit) and backend server (FastApi), utilizing transformer models across various downstream NLP tasks.

Downstream NLP tasks covered:

  • News Classification
  • Entity Recognition
  • Sentiment Analysis
  • Summarization
  • Information Extraction

Users can select different models from a dropdown list to run inference or directly use the backend FastAPI server for command line inference.

Features of the solution

  • Python Code Base: Built using FastAPI and Streamlit, making the complete code base in Python.
  • Expandable: The backend is designed to allow expansion with more transformer-based models, which will automatically be available in the front end app.
  • Micro-Services: The backend uses a microservices architecture, featuring a Dockerfile for each service and leveraging Nginx as a reverse proxy to each independently running service.
    • This eases updates, maintenance, and management of individual NLP services.

Installation

  • Clone the Repository.
  • Run Docker Compose to spin up the FastAPI backend service.
  • Run the Streamlit app using the streamlit run command.

Setup and Documentation

  1. Download the Models
    • Download the models from here
    • Save them in the specific model folders inside the src_fastapi folder.
  2. Running the Backend Service
    • Navigate to the src_fastapi folder
    • Run the Docker Compose command:
      $ cd src_fastapi
      $ sudo docker-compose up -d
  3. Running the Frontend App
    • Navigate to the src_streamlit folder:
    • Create the Docker image from the Docker file:
      $ cd src_streamlit
      $ sudo docker build -t streamlit_app .
      $ sudo docker run -d --name streamlit_app streamlit_app
    • Run the app with the streamlit run command:
      $ cd src_streamlit
      $ streamlit run NLPfily.py
  4. Access to FastAPI Documentation: Given the microservice design, every NLP task has its own separate documentation:

Project Details

Demonstration

Project Insight Demo

Directory Details

  • Front End: The front end code resides in the src_streamlit folder along with the Dockerfile and requirements.txt.
  • Back End: The back end code is located in the src_fastapi folder:
    • This folder contains a directory for each task: Classification, NER, Summary, etc.
    • Each NLP task is implemented as a microservice with its own FastAPI server, requirements, and Dockerfile, allowing for independent maintenance and management.
    • Each task contains folders for each trained model. For example:
      • Sentiment
      • app
      • api
      • distilbert
        • model.bin
        • network.py
        • tokenizer files
      • roberta
        • model.bin
        • network.py
        • tokenizer files
  • For each new model under each service, create a new folder to store the following files:
    • Model bin file.
    • Tokenizer files.
    • network.py: Defines the model’s class for customized models.
    • config.json: Contains model details and training datasets.

How to Add a new Model

  1. Fine-tune a transformer model for the specific task. Useful resources include the transformers-tutorials.
  2. Save the model and tokenizer files, and create the network.py script if using a customized training network.
  3. Create a directory within the NLP task, naming it after the model, and save the associated files here.
  4. Update the config.json with the new model’s details and relevant dataset information.
  5. Modify the servicepro.py to ensure proper imports and model selection:
    • Create a new directory in classification/app/api called “bert”.
    • Update config.json with model information.
    • Update classificationpro.py with only customized class use if necessary.

License

This project is licensed under the GPL-3.0 License – see the LICENSE.md file for details.

Troubleshooting

If you encounter issues while setting up Project Insight, consider the following:

  • Ensure that Docker is installed and running properly on your machine.
  • Check that you are using compatible versions of Python and required libraries mentioned in the requirements.txt file.
  • If the web app fails to load, revisit the Docker Compose logs for error messages.
  • If you have network issues, verify that the firewall settings allow traffic on ports 8080 for FastAPI and any other necessary ports for Streamlit.
  • X Additionally, ensure that all directory paths for model storage are correctly specified.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox