Lego AI Parser is an open-source application that employs OpenAI to parse visible text from HTML elements. Built on FastAPI, this tool is ready to be set up as a server, allowing calling from any programming language.
Table of Contents
- Basic Usage
- Parsing Multiple Elements
- Designing Custom Parsers
- Making Server-Side Calls without exposing API Key
- Expected Error Responses
- Customizing Default Allowed Concurrency and API Key of Client-Side Calls
- Contributions Guide
Basic Usage
To get started with Lego AI Parser, follow these steps:
- Copy the Outer HTML: Get the HTML of the element you want to parse.
- OpenAI API Key: You need to register a free account and retrieve your API Key here.
- Make a POST Request: Using your preferred programming language, make a POST request to the endpoint:
import requests
uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
"path": "google.google_local_results",
"targets": [
// Your HTML target here
],
"openai_key": "YOUR_OPENAI_KEY"
}
r = requests.post(url=uri, headers=headers, json=data)
print(r.json()["results"])
Parsing Multiple Elements
Lego AI Parser can handle multiple elements simultaneously. You have the flexibility to mix HTML and text copied from within elements. If the collected prompts exceed the token size, the parser will intelligently split them for you.
import requests
uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
"path": "google.google_local_results",
"targets": [
"X Coffee 4.1(23) · €€ · Coffee shop Nicosia Counter-serve chain for coffee snacks",
"Y Coffee 4.0 (418) · €€ · Coffee shop Nicosia Iconic Seattle-based coffeehouse chain"
],
"openai_key": "YOUR_OPENAI_KEY"
}
r = requests.post(url=uri, headers=headers, json=data)
print(r.json()["results"])
Designing Custom Parsers
In addition to preset parsers, you can design your own custom parsers. To do this, supply a prompt, the necessary examples, and model details.
import requests
uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
"targets": [
// Your HTML target here
],
"openai_key": "YOUR_OPENAI_KEY",
"classifier": {
"main_prompt": "A table with NUMBER_OF_LABELS...",
"data": {
"model": "text-davinci-003",
"temperature": 0.001,
"top_p": 0.9,
"best_of": 2,
"frequency_penalty": 0,
"presence_penalty": 0
}
}
}
r = requests.post(url=uri, headers=headers, json=data)
print(r.json()["results"])
Think of creating a custom parser like building a LEGO set: you have specific pieces (elements and examples) that you assemble following an instruction manual (the prompt) to create a new structure (parser). Your resulting structure can then be customized further by changing the pieces (parameters) according to your needs.
Making Server-Side Calls without exposing API Key
Prompts Only Call
When only needing the prompts for the OpenAI endpoint without utilizing the API Key directly, use:
import requests
uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
"prompts_only": True,
"path": "google.google_local_results",
"targets": [
// Your HTML target here
]
}
r = requests.post(url=uri, headers=headers, json=data)
print(r.json())
Making Server-Side Calls to OpenAI
To connect to OpenAI from server-side, adjust your parameters appropriately:
import os
import openai
import requests
uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
response_from_lego_ai_parser = requests.post(url=uri, headers=headers, json=data)
openai.api_key = os.getenv("OPENAI_API_KEY")
prompts = response_from_lego_ai_parser["prompts"]
responses = []
for prompt in prompts:
response = openai.Completion.create(
model="text-davinci-003",
prompt=prompt,
temperature=0.001,
max_tokens=400,
top_p=0.9,
best_of=2,
frequency_penalty=0,
presence_penalty=0
)
responses.append(response)
print(responses)
Parse Only Call
After obtaining responses, you can use the parsed results as shown below:
data = {
"path": "google.google_local_results",
"parse_only": {
"responses": responses,
"prompt_objects": response_from_lego_ai_parser["prompt_objects"]
}
}
response_from_lego_ai_parser = requests.post(url=uri, headers=headers, json=data)
print(response_from_lego_ai_parser.json())
Expected Error Responses
If you encounter errors, such as an incorrect API key or quota exceeded, responses will be structured similarly:
{
"results": [
{
"message": "Incorrect API key provided: Your Op*****Key.",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
]
}
Customizing Default Allowed Concurrency and API Key of Client-Side Calls
You can adjust the allowed concurrency for client-side calls easily:
import requests
uri = "https://yourserver.com/classify"
headers = {"Content-Type": "application/json"}
data = {
"allowed_concurrency": 2,
"path": "google.google_local_results",
"targets": [
// Your HTML target here
],
"openai_key": "YOUR_OPENAI_KEY"
}
r = requests.post(url=uri, headers=headers, json=data)
print(r.json()["results"])
Contributions Guide
If you aim to contribute to this project, refer to the Contributions Guide Page for valuable insights. All bug reports, suggestions, and feature requests are highly appreciated!
Troubleshooting Ideas
- Ensure your OpenAI Key is valid and not expired.
- Check if your API key has enough quota remaining.
- Make sure that your requests are below the maximum token size.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

