How to Create a CAPTCHA Solver using LLaVA

Apr 6, 2021 | Educational

In today’s digital landscape, CAPTCHAs serve as a barrier against bots and automated systems. But what if you’re curious about how these CAPTCHAs can be solved programmatically for research and educational purposes? This article guides you through building a simple CAPTCHA solver using the LLaVA-v1.6-7b model.

Disclaimer

Before we dive into the project, it’s essential to understand that this CAPTCHA solver is intended for research and educational purposes only. Using this software to bypass CAPTCHAs on websites may violate their Terms of Service and could have legal repercussions.

What Is This Project About?

This project provides a basic proof-of-concept for Google’s reCAPTCHA solver using a model that extracts object names and detects objects in each square of the CAPTCHA grid. The solver operates solely based on visual data, meaning it does not interact with the HTML of a webpage.

The solver takes screenshots and clicks on images at specified locations. It identifies the CAPTCHA grid size and can detect changes in images. My experiments found that it can complete CAPTCHAs in as little as 2 minutes, although generally, it’s much faster.

For those interested in a hands-on showcase, you can view this short demonstration video.

Limitations to Keep in Mind

  • The solver requires a GPU with at least 16 GB of VRAM.
  • Currently, it only operates on Ubuntu for very specific reasons:
    • The CAPTCHA window detection is tailored exclusively for this OS.
    • LLaVA supports only Linux, and using it via Ollama is not sufficiently accurate.
  • If images disappear during solving, all images will need to be reclassified afterward.
  • It’s only compatible with this specific reCAPTCHA layout; changes in layout would require updating reference images.

How to Install and Run the CAPTCHA Solver

Ready to set up your own CAPTCHA solver? Follow these steps:

  1. View and follow the installation instructions at the LLaVA’s Repository.
  2. Install gnome-screenshot by running the command:
    sudo apt install gnome-screenshot
  3. Install the necessary Python libraries using pip:
    pip install protobuf PyAutoGUI opencv-python pillow
  4. Run the script main.py to initiate the CAPTCHA solving process. Once completed, the script will automatically close (the LLaVA model should automatically download on the first run).

Understanding the Process: An Analogy

Think of using this CAPTCHA solver like training a puppy to recognize and fetch specific toys. Just as you’ll start by showing the puppy various toys and repeating their names, the LLaVA model first learns from reference images to detect and identify objects in the reCAPTCHA squares. When you’re ready to solve a CAPTCHA (or send the puppy on a fetch mission), it takes a screenshot (like a snapshot of the toys he’s supposed to grab) and will then click on the correct images based on its training, much like your puppy bringing back the right toy after learning the game.

Troubleshooting

If you encounter issues during setup or execution, here are some troubleshooting tips:

  • Ensure your GPU drivers are up to date to prevent any performance bottlenecks.
  • Verify that all required libraries are correctly installed.
  • Check for compatibility issues if you change any system configurations.
  • If you experience slow performance, consider reallocating system resources or upgrading your hardware.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Thanks for following along! Your contributions to the code are welcome, so feel free to modify or improve upon the solver as you see fit.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox