How to Explore the SmartBugs Wild Dataset

Jul 11, 2022 | Blockchain

The SmartBugs Wild Dataset is a treasure trove of data containing 47,398 smart contracts extracted from the Ethereum network. In this article, we’ll dive into how to navigate this dataset, understand its structure, and leverage it for your projects.

Understanding the Dataset Structure

The repository comprises several components that are essential for exploring smart contracts. Here’s what you’ll find:

  • contracts: This folder contains individual smart contracts named as contract_address.sol.
  • contracts.csv.tar.gz: This file contains the meta-data for all the contracts.
  • get_contracts.py: A script that collects the source code of the contracts from Etherscan.
  • get_balance.py: A script that retrieves the balance of the contracts from Etherscan.

How the Dataset Was Created

The dataset was methodically assembled through a three-step process that can be likened to gathering ingredients for a special recipe:

  • Collection of Contract Addresses: Think of this as gathering the main ingredients. The team utilized Google BigQuery to select contracts with at least one transaction. This collection was executed on August 8, 2019. The SQL query used to extract data can be viewed here.
  • Downloading Source Codes: Just like prepping those ingredients, the code associated with each contract address was downloaded via Etherscan, ensuring everything needed for the analysis was readily available.
  • Filtering Duplicates: Finally, just as you would pick out the freshest ingredients, the team filtered out any duplicate contracts to ensure the dataset remained unique and clean.

Metrics to Note

  • Solidity source not available: 1,290,074
  • Solidity source available: 972,855
  • Unaccessible: 47
  • Invalid: 120
  • Total: 2,263,096
  • Unique Solidity Contracts: 47,398
  • LOC of Unique Contracts: 9,693,457

Licensing Information

All files in the repository, except those in the contracts folder, are governed by the license specified in the LICENSE file. The files in the contracts folder are publicly accessible, obtained through the Etherscan APIs, and retain their original licenses. For any queries, don’t hesitate to reach out.

Troubleshooting Common Issues

Here are some potential challenges you might face when working with the SmartBugs Wild Dataset, along with solutions to help you navigate them:

  • Issue: Missing or Incomplete Data

    Ensure that your download scripts are running properly. You might want to check the Etherscan API for any changes or interruptions. If found, you can rerun your data collection script.

  • Issue: Script Errors

    When executing the provided Python scripts, pay attention to any syntax errors or missing libraries. Ensure you have the required libraries installed. You can do this by running pip install -r requirements.txt after navigating to the script directory.

  • Issue: API Limitations

    If you encounter an issue while retrieving data from Etherscan, it might be due to API limits. Try to minimize the frequency of your API requests or check your API usage. Additionally, for further assistance, you may explore resources or updates from the community.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox