How to Use the Structure Extraction Model by NuMind

Jun 29, 2024 | Educational

Welcome to the world of advanced information extraction! Today, we’re diving into the NuExtract_tiny model by NuMind, which is designed to help you efficiently extract structured information from unstructured text. Let’s walk through the steps of using this powerful tool, troubleshoot common issues, and explore what makes it so special.

What is NuExtract_tiny?

NuExtract_tiny is a fine-tuned version of the Qwen1.5-0.5 model, optimized for information extraction tasks. This model is extractive in nature, meaning that it pulls out parts of your input text according to a specified structure—without altering the original content.

To use this model, you will send it a block of text (up to 2000 tokens) alongside a JSON template that describes the information you want to extract. Think of it like asking a librarian for specific information from a book and giving her the table of contents as a guide to help her find what you need!

Setting Up the Model

Before extracting information, you need to set up the model correctly. Follow these steps:

Import the Necessary Libraries: You will use transformers from Hugging Face, so make sure you have it installed.
Load the Model and Tokenizer: Utilize the AutoModelForCausalLM and AutoTokenizer classes to load the NuExtract model.

Code Example for Extraction

The following code snippet demonstrates how to use the NuExtract model to extract information from text:


import json
from transformers import AutoModelForCausalLM, AutoTokenizer

def predict_NuExtract(model, tokenizer, text, schema, example=[]):
    schema = json.dumps(json.loads(schema), indent=4)
    input_llm = "### Template:\n" + schema + "\n"
    
    for i in example:
        if i != "":
            input_llm += "### Example:\n" + json.dumps(json.loads(i), indent=4) + "\n"
    input_llm += "### Text:\n" + text + "\n"
    output = model(tokenizer(input_llm, return_tensors='pt', truncation=True, max_length=4000).to('cuda'))

    return output.split(output.split("end-output")[0])[1]  # This needs correction for proper use

model = AutoModelForCausalLM.from_pretrained("numind/NuExtract-tiny", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("numind/NuExtract-tiny", trust_remote_code=True)

model.to('cuda')
model.eval()

text = "We introduce Mistral 7B, a 7–billion-parameter language model ..."
schema = '''{
    "Model": {
        "Name": "",
        "Number of parameters": "",
        "Number of max token": "",
        "Architecture": ""
    },
    "Usage": {
        "Use case": "",
        "Licence": ""
    }
}'''

prediction = predict_NuExtract(model, tokenizer, text, schema, example=[])
print(prediction)

Understanding the Code with an Analogy

Imagine you’re a chef preparing a unique dish. Each step in the recipe is crucial for achieving the perfect taste. In our code:

Imports: Just like gathering cooking utensils and ingredients, we start by importing necessary libraries to utilize the model.
Model and Tokenizer Setup: Loading the model is akin to preheating your oven. You need the right temperature to cook your dish to perfection.
Function Definition: The predict_NuExtract function is your cooking method. It’s where you mix your ingredients (input), so they meld together to create the final dish (output).
Schema and Input: The schema acts like a recipe list, telling the model what ingredients to focus on. The input text is like the main ingredient you want to transform into a delicious meal (extracted information).

Troubleshooting Common Issues

While using the NuExtract model, you may encounter some hiccups. Here are a few troubleshooting tips:

Model Not Loading: Ensure your environment has access to the internet and the transformers library is correctly installed.
Truncation Errors: If you receive a truncation error, check if your input text exceeds the character limit. Reduce the text length.
No Output: If you get no output, ensure that your schema is correctly formatted as JSON and properly structured.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Conclusion

Using the NuExtract model can greatly enhance your information extraction capabilities. With the right setup and approach, you can streamline how you derive structured information from text. Happy extracting!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox