Are you looking to create a custom GPT (Generative Pre-trained Transformer) from specific websites? The GPT Crawler is a fantastic tool that allows you to crawl a website and generate knowledge files to train your own model. In this post, we’ll walk you through the steps to get started with this powerful tool, including troubleshooting tips to help you out!
Getting Started
Follow the outlined steps to set up your GPT Crawler locally.
Running Locally
Clone the Repository
First things first, you’ll want to clone the GPT Crawler repository. Make sure you have Node.js version 16 installed on your machine.
git clone https://github.com/BuilderIO/gpt-crawler
Install Dependencies
Next, navigate to the cloned repository and install the necessary dependencies using npm.
npm i
Configure the Crawler
Open the config.ts file and update the URL and selector properties according to your needs. For example, if you’re crawling the Builder.io documentation, it might look like the following:
export const defaultConfig: Config = {
url: "https://www.builder.io/docs/developers",
match: "https://www.builder.io/docs/**",
selector: ".docs-builder-container",
maxPagesToCrawl: 50,
outputFileName: "output.json",
};
This setup defines which website to crawl, how many pages to explore, and where to save the output data.
Run Your Crawler
With your configuration ready, initiate the crawling process.
npm start
Alternative Methods
You can also run GPT Crawler in different environments.
Running in a Container with Docker
If you prefer Docker, go into the containerapp directory and adjust config.ts similarly. Once done, run the container to generate output.json.
Running as an API
To execute the crawler as an API server, install the dependencies and start the server. The server will run on port 3000 by default.
npm run start:server
Upload Your Data to OpenAI
After crawling, you’ll have a file named output.json. This file can be uploaded to OpenAI to create your custom assistant or GPT.
Create a Custom GPT
1. Go to ChatGPT.
2. Click your name in the bottom left corner.
3. Select “My GPTs” from the menu.
4. Click “Create a GPT.”
5. Choose “Configure” and then select “Upload a file” to upload your generated output.json.
If you encounter an error about the file size, consider splitting it up or reducing the number of tokens.
Create a Custom Assistant
This option provides API access to your indexed knowledge. To create one:
1. Navigate to OpenAI Assistants.
2. Click “+ Create.”
3. Select “upload” and upload your output.json.
Troubleshooting Tips
- If the crawler isn’t fetching the expected data, double-check the URL and selector in the
config.tsfile. - Make sure your Node.js version is compatible; reinstall if necessary.
- If you encounter issues during upload to OpenAI, verify if the file exceeds their size and token limitations.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Happy crawling and creating your custom GPT!

