Generating Textbook Quality Content with Python

Jun 8, 2022 | Educational

Welcome to the world of generating very long, textbook quality pre-training data! In this article, we’ll guide you through the steps to set up and use a project designed to help you generate comprehensive and detailed content. Think of it as having a highly knowledgeable assistant who can churn out quality textbooks on various topics, tailored to your needs. Let’s dive in!

Prerequisites

  • Make sure you have Python 3.9+ (ideally 3.11) installed on your system.
  • Have PostgreSQL installed. If you’re a Mac user, you can easily install it using brew install postgres.

Setting Up the Project

Let’s get this project rolling! Follow these steps:

  1. Open your command line and create a new database with:
  2. psql postgres -c create database textbook;
  3. Clone the repository:
  4. git clone https://github.com/VikParuchuri/textbook_quality.git
  5. Navigate into the cloned folder:
  6. cd textbook_quality
  7. Install the required dependencies:
  8. poetry install
  9. Run the migration command:
  10. invoke migrate-dev

Configuration Options

Before you can start generating content, you need to configure your environment. Here’s how:

  1. Create a file named local.env in the root directory to keep your secret keys.
  2. For quality generation, set up your keys and choose your backend:
    • OpenAI Key: OPENAI_KEY=sk-xxxxxx
    • Choose a retrieval backend:
      • For Serply: SERPLY_KEY=...
      • For SerpAPI: SERPAPI_KEY=...
      • To disable: SEARCH_BACKEND=none
  3. By default, the generator uses GPT-3.5. To use GPT-4, set the following variables:
    • LLM_TYPE=gpt-4
    • LLM_INSTRUCT_TYPE=gpt-4

Generating Content

Now that everything is set up, you can start generating topics, augmenting them, and creating entire textbooks!

Generating Topics from Scratch

To create new topics, run:

python topic_generator.py   --iterations 

For instance:

python topic_generator.py "computer science" "python_cs_titles.json" --iterations 50

Augmenting Topics from Seeds

If you have existing topics, you can augment them:

python topic_augmentor.py   --domain 

Generating Textbooks

To generate textbooks from your topics:

python book_generator.py   --workers 

Example:

python book_generator.py topics.json books.jsonl --workers 5

An Analogy for Better Understanding

Imagine this process as baking a cake. The prerequisites (Python and PostgreSQL) are like gathering your ingredients and baking tools. Configuring your environment is akin to setting your oven temperature and preparing your baking pan. Finally, generating the content is like mixing your ingredients and putting the cake in the oven—it’s where the magic happens! Just as a cake takes time to bake, your content will be generated based on your input and the system’s capabilities.

Troubleshooting Tips

If you encounter any issues while setting up or using the project, consider these tips:

  • Check your Python version by running python --version to ensure it meets the prerequisite.
  • Ensure that PostgreSQL is properly installed and running. You can check its status with brew services list.
  • If you face issues with the generation scripts, verify that your API keys are correctly entered in local.env.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Extending the Project

This project is adaptable. If you wish to add new features or retrieval methods, you can explore:

  • LLM adapters within appllmadaptors
  • Retrieval methods in appservicesadaptors

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox