Kor: Your Guide to Extracting Structured Data from Text Using LLMs

Mar 20, 2023 | Data Science

In today’s fast-paced AI-driven world, the ability to efficiently extract structured data from unstructured text is crucial. Enter Kor, a prototype tool designed to leverage Large Language Models (LLMs) for this very purpose. If you’re intrigued by this powerful utility and want to learn how to set it up, you’re in the right place!

What is Kor?

Kor serves as a wrapper around LLMs, allowing users to specify extraction schemas and provide examples for data extraction. The tool generates prompts, interacts with LLMs, and parses the output to obtain structured data. While it may feel like just another abstraction layer, its unique flavor sets it apart.

How to Use Kor

Let’s break down the process of using Kor effectively. Here’s a step-by-step guide:

1. Install the Kor Package

Start by installing the tool via pip. Open your terminal and execute the following command:

pip install kor

2. Define Your Schema

To tell Kor what to extract, you need to define a schema. Think of this step like creating a blueprint for a house. The blueprint needs to specify the layout and type of each room (or data field). Here’s an example of defining a schema using the Pydantic library:

from langchain.chat_models import ChatOpenAI
from kor import create_extraction_chain, Object, Text

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, max_tokens=2000, model_kwargs={"frequency_penalty": 0, "presence_penalty": 0, "top_p": 1.0})

schema = Object(
    id="player",
    description="User is controlling a music player to select songs, pause or start them or play music by a particular artist.",
    attributes=[
        Text(id="song", description="User wants to play this song", many=True),
        Text(id="album", description="User wants to play this album", many=True),
        Text(id="artist", description="User wants to hear music by the given artist, with examples", many=True),
    ],
    many=False,
)

3. Create an Extraction Chain

Next, create an extraction chain that uses the schema you’ve defined. This step is akin to assembling the team that will build your house according to the specified blueprint:

chain = create_extraction_chain(llm, schema, encoder_or_encoder_class=json)
result = chain.invoke("play songs by Paul Simon and Led Zeppelin.")
print(result)[data]

4. Invoke the Chain

Finally, you can invoke the chain with a command, and it will parse the results according to your schema. Think of this step as watching your house gradually come to life as the builders follow the blueprint!

Troubleshooting Kor

Even the smoothest processes can encounter obstacles. Here are a few troubleshooting ideas:

  • Ensure you are using Python versions 3.8 to 3.11, as Kor is tested against these versions.
  • If you encounter issues with schema validation, consider reviewing your Pydantic definitions to check for missing attributes or mismatched types.
  • For any unexpected errors or bugs, make sure to check the Kor GitHub repository for updates or similar issues reported.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Kor provides a powerful tool for extracting structured data from text using LLMs, framed within intuitive schemas. Though it is still a prototype and may have some performance limitations, it holds great potential for future enhancements.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox