If you’ve ever wanted to extract information from a website but didn’t know where to start, look no further! With the power of Scrape It Now, you can easily scrape websites in just a few steps. This blog will guide you through the process, from installation to running your first scrape job.
Features of Scrape It Now
Before diving into how to use it, let’s look at what makes Scrape It Now a must-have tool:
- Decoupled architecture with Azure Queue Storage or local SQLite.
- Operates as a CLI with a standalone binary.
- Idempotent operations that can run in parallel.
- Efficient storage options using Azure Blob Storage or the local disk.
- Automatically creates AI search indexes and ensures the content is semantically searchable.
Installation
First, you’ll need to install Scrape It Now on your machine. You can do this in two ways:
From Binary
- Download the latest release from the releases page. Available for Linux, macOS, and Windows.
- Configure the CLI using environment variables, a .env file, or command line options.
From Source
# Download the source code
git clone https://github.com/clemlesnes/scrape-it-now.git
# Move to the directory
cd scrape-it-now
# Run install scripts
make install dev
# Run the CLI
scrape-it-now --help
How to Use Scrape It Now
Now that you have it installed, let’s dive into how to scrape a website!
Scrape a Website
Follow these steps to start scraping:
Using Azure Blob Storage
# Azure Storage configuration
export AZURE_STORAGE_CONNECTION_STRING=xxx
# Run the jobs
scrape-it-now scrape run https://nytimes.com
Using Local Disk
# Local disk configuration
export BLOB_PROVIDER=local_disk
export QUEUE_PROVIDER=local_disk
# Run the jobs
scrape-it-now scrape run https://nytimes.com
Viewing Job Status
To check the status of your scraping job:
# Azure Storage configuration
export AZURE_STORAGE_CONNECTION_STRING=xxx
# Show job status
scrape-it-now scrape status [job_name]
Understanding the Process
Imagine you’re trying to gather ingredients from various grocery stores. Each store has a specific layout, and you must explore them carefully to gather all necessary items without missing anything. Scrape It Now works in a similar manner:
- Your command (like a shopping list) fetches data from a website.
- It checks each section (link) to see what has changed since your last visit (to avoid redundancy).
- Just like a grocery clerk, it organizes the items (data) into buckets (Azure/Local storage) for easy access later.
- Finally, just as you might create a digital recipe based on your ingredients, Scrape It Now automatically creates an index of your findings for seamless searching.
Troubleshooting
If you encounter issues while using Scrape It Now, consider the following troubleshooting steps:
- Ensure your Azure Storage connection string is correctly configured.
- Check if your environment variables are properly set and loaded.
- If scraping fails, verify that the target website is up and running.
- Make sure all dependencies are installed and your Python version is compatible.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
Scraping websites can open a world of data possibilities, and with Scrape It Now, it’s easier than ever. The combination of Azure services and user-friendly commands allows you to focus on what matters most – the data! At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

