How to Use Schema-Aware Text to SQL with Bart Model

Aug 21, 2021 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_4_1135

If you’re curious about how to convert natural language queries into SQL commands, you’re in the right place! In this guide, we will explore how to utilize the schema-aware text-to-SQL capabilities of the Bart model to answer questions related to a database. We will dive into the code, and I’ll provide some troubleshooting tips as well!

Getting Started

To begin with, ensure you have the necessary libraries installed. You will need the Hugging Face Transformers library for the Bart model. After that, follow along with the code below to set things up:

from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

# Load the model and tokenizer
model = BartForConditionalGeneration.from_pretrained('shahrukhx01/schema-aware-denoising-bart-large-cnn-text2sql')
tokenizer = BartTokenizer.from_pretrained('shahrukhx01/schema-aware-denoising-bart-large-cnn-text2sql')

Formulating Your Query

Now that you have your model and tokenizer ready, it’s time to prepare your input query. Let’s say you want to find out the nationality of Terrence Ross. Here’s how you would formulate the input:

# Define the natural language question and the schema
question = "What is Terrence Ross nationality?"
schema = "s col0 Player : text col1 No. : text col2 Nationality : text col3 Position : text col4 Years in Toronto : text col5 SchoolClub Team : text"
inputs = tokenizer([question], max_length=1024, return_tensors='pt')

In this analogy, think of your database as a library and your schema as the cataloging system. Each book on the shelf has certain attributes: title, author, genre, etc. Your natural language query is like asking a librarian for a specific book based on these attributes.

Generating the SQL Query

Once your input is set, the next step is to generate the SQL query from the model:

# Generate SQL
text_query_ids = model.generate(inputs['input_ids'], num_beams=4, min_length=0, max_length=125, early_stopping=True)

# Decode the generated SQL query
prediction = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in text_query_ids][0]
print(prediction)

What Happens Next?

The output of the above code will give you the SQL query that corresponds to your natural language query. Like generating a library code from a spoken request, this is how you retrieve the exact information you sought!

Troubleshooting

If you encounter any challenges while executing the code, here are some troubleshooting ideas:

Model Not Loading: Ensure that you have internet access, as the pre-trained models must be downloaded from their repository.
Input Length Exceeds Limit: If your query or schema is too lengthy, consider shortening them. The model has a max input length, typically around 1024 tokens.
Missing Libraries: Verify that you have installed all required libraries, notably the Transformers library.
Unexpected Outputs: If the output seems off, try rephrasing your natural language query for clarity.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By now, you should have a clear understanding of how to use the schema-aware Text-to-SQL capabilities of the Bart model. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox