Welcome to the ultimate guide on using the AutoCrawlerGoogle, a high-speed, customizable multiprocess image crawler that can download images from Google and Naver with ease. Whether you’re looking to gather images for a project or simply want to explore, this tool will meet your needs!
Getting Started
To start using AutoCrawler, you’ll need to follow these simple steps:
- Install Chrome to run the necessary scripts.
- Open your terminal and execute the command:
-
pip install -r requirements.txt
-
- Write your search keywords in a file named
keywords.txt. - Run the Python script by executing:
-
python3 main.py
-
- Your files will be downloaded to the designated download directory.
Understanding the Arguments
The main.py script allows for several arguments to customize your crawling experience:
--skip true: Skips downloading a keyword if images already exist in the directory.--threads 4: Defines the number of threads for downloading.--google true: Enables downloads from google.com.--naver true: Enables downloads from naver.com.--full false: By default, will download thumbnails; set totruefor full resolution (be aware, it may be slower).--face false: Activates face search mode.--no_gui auto: Runs the crawler without a GUI. This is especially useful for headless environments.--limit 0: An infinite limit for the number of images to be downloaded per site.--proxy-list: A comma-separated list of proxies. Each thread will randomly choose a proxy from this list to ensure anonymity.
Full Resolution Mode
To download images in full resolution (JPG, GIF, or PNG), specify --full true in your command when running the script.
Data Imbalance Detection
The AutoCrawler doesn’t just download images; it also makes sure you’re getting a good spread of data. After crawling is complete, it will identify directories that have fewer than 50% of the average file count. It’s a good practice to remove directories that are lacking and try redownloading.
Remote Crawling
If you’re looking to run your crawler remotely, follow these steps:
- Install Virtual Display:
- Install Screen:
- Run the crawler inside a Screen session:
sudo apt-get install xvfb
sudo apt-get install screen
screen -S s1Xvfb :99 -ac DISPLAY=:99 python3 main.py
Customizing Your Crawler
Do you have ideas for a unique crawler that might better suit your needs? You can customize AutoCrawler by tweaking the collect_links.py file. Dive deep into the code and adjust it to enhance your crawling experience!
Troubleshooting Your Crawler
Installation and usage can sometimes hit a snag, especially because Google’s layout changes frequently. Here are some steps to troubleshoot:
- Visit Google Images through your Chrome browser: Google Images.
- Open Developer Tools by pressing
CTRL+SHIFT+I(orCMD+OPTION+Ion Mac). - Select an image to capture. Follow the prompts illustrated in the developer tools.
- Take note of image selection logic and modify the
collect_links.pyaccordingly. - Refer to W3Schools XPATH Documentation for syntax help.
- Utilize the
CTRL+Ffeature in developer tools to test yourXPATHqueries.
If issues persist or you’re in need of real-time support, feel free to reach out! For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

