Lego AI Parser

Feb 7, 2024 | Educational

Build status Contributors Forks Stargazers Issues Issues Closed MIT License

A Child with a Calculator

Lego AI Parser is an open-source application that employs OpenAI to parse visible text from HTML elements. Built on FastAPI, this tool is ready to be set up as a server, allowing calling from any programming language.

Table of Contents

Basic Usage

To get started with Lego AI Parser, follow these steps:

  • Copy the Outer HTML: Get the HTML of the element you want to parse.
  • OpenAI API Key: You need to register a free account and retrieve your API Key here.
  • Make a POST Request: Using your preferred programming language, make a POST request to the endpoint:
  • import requests
    
    uri = "https://yourserver.com/classify"
    headers = {"Content-Type": "application/json"}
    data = {
        "path": "google.google_local_results",
        "targets": [
            // Your HTML target here
        ],
        "openai_key": "YOUR_OPENAI_KEY"
    }
    
    r = requests.post(url=uri, headers=headers, json=data)
    print(r.json()["results"])

Parsing Multiple Elements

Lego AI Parser can handle multiple elements simultaneously. You have the flexibility to mix HTML and text copied from within elements. If the collected prompts exceed the token size, the parser will intelligently split them for you.

import requests

uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
    "path": "google.google_local_results",
    "targets": [
        "X Coffee 4.1(23) · €€ · Coffee shop Nicosia Counter-serve chain for coffee snacks",
        "Y Coffee 4.0 (418) · €€ · Coffee shop Nicosia Iconic Seattle-based coffeehouse chain"
    ],
    "openai_key": "YOUR_OPENAI_KEY"
}

r = requests.post(url=uri, headers=headers, json=data)
print(r.json()["results"])

Designing Custom Parsers

In addition to preset parsers, you can design your own custom parsers. To do this, supply a prompt, the necessary examples, and model details.

import requests

uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
    "targets": [
        // Your HTML target here
    ],
    "openai_key": "YOUR_OPENAI_KEY",
    "classifier": {
        "main_prompt": "A table with NUMBER_OF_LABELS...",
        "data": {
            "model": "text-davinci-003",
            "temperature": 0.001,
            "top_p": 0.9,
            "best_of": 2,
            "frequency_penalty": 0,
            "presence_penalty": 0
        }

    }
}

r = requests.post(url=uri, headers=headers, json=data)
print(r.json()["results"])

Think of creating a custom parser like building a LEGO set: you have specific pieces (elements and examples) that you assemble following an instruction manual (the prompt) to create a new structure (parser). Your resulting structure can then be customized further by changing the pieces (parameters) according to your needs.

Making Server-Side Calls without exposing API Key

Prompts Only Call

When only needing the prompts for the OpenAI endpoint without utilizing the API Key directly, use:

import requests

uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
    "prompts_only": True,
    "path": "google.google_local_results",
    "targets": [
        // Your HTML target here
    ]
}

r = requests.post(url=uri, headers=headers, json=data)
print(r.json())

Making Server-Side Calls to OpenAI

To connect to OpenAI from server-side, adjust your parameters appropriately:

import os
import openai
import requests

uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
response_from_lego_ai_parser = requests.post(url=uri, headers=headers, json=data)

openai.api_key = os.getenv("OPENAI_API_KEY")
prompts = response_from_lego_ai_parser["prompts"]
responses = []

for prompt in prompts:
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        temperature=0.001,
        max_tokens=400,
        top_p=0.9,
        best_of=2,
        frequency_penalty=0,
        presence_penalty=0
    )
    responses.append(response)
print(responses)

Parse Only Call

After obtaining responses, you can use the parsed results as shown below:

data = {
    "path": "google.google_local_results",
    "parse_only": {
        "responses": responses,
        "prompt_objects": response_from_lego_ai_parser["prompt_objects"]
    }
}

response_from_lego_ai_parser = requests.post(url=uri, headers=headers, json=data)
print(response_from_lego_ai_parser.json())

Expected Error Responses

If you encounter errors, such as an incorrect API key or quota exceeded, responses will be structured similarly:

{
    "results": [
        {
            "message": "Incorrect API key provided: Your Op*****Key.",
            "type": "invalid_request_error",
            "param": null,
            "code": "invalid_api_key"
        }
    ]
}

Customizing Default Allowed Concurrency and API Key of Client-Side Calls

You can adjust the allowed concurrency for client-side calls easily:

import requests

uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
    "allowed_concurrency": 2,
    "path": "google.google_local_results",
    "targets": [
        // Your HTML target here
    ],
    "openai_key": "YOUR_OPENAI_KEY"
}

r = requests.post(url=uri, headers=headers, json=data)
print(r.json()["results"])

Contributions Guide

If you aim to contribute to this project, refer to the Contributions Guide Page for valuable insights. All bug reports, suggestions, and feature requests are highly appreciated!

Troubleshooting Ideas

  • Ensure your OpenAI Key is valid and not expired.
  • Check if your API key has enough quota remaining.
  • Make sure that your requests are below the maximum token size.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox