How to Mine Source Code Repositories at Scale with Tree Hugger

Apr 16, 2022 | Programming

If you are looking to extract valuable insights from Git repositories seamlessly, Tree Hugger is the library you need. This Pythonic library is designed to help you delve into various code files with ease. In this article, we will guide you through the installation process, setup, and a quick “Hello World” example.

Table of Contents

Installation

To get started with Tree Hugger, you can install it either via pip or from the source. Here’s how to do it:

From pip:

pip install -U tree-hugger PyYAML

From Source:

git clone https://github.com/autosoft-dev/tree-hugger.git
cd tree-hugger
pip install -e .

Note: You may need to install libgit2 using brew install libgit2 if you’re on macOS.

Setup

Setting up Tree Hugger requires some additional steps to ensure smooth functioning:

  • Ensure you have the necessary .so files. If you encounter difficulties, download the required zip file from here.
  • Configure the TS_LIB_PATH environment variable for the tree-sitter library path if necessary.

Hello World Example

Now that you have everything set up, let’s dive into a simple example:

# Importing required parser classes
from tree_hugger.core import PythonParser

# Creating a Python Parser object
pp = PythonParser()

# Parsing a sample Python file
pp.parse_file('tests/assets/file_with_different_functions.py')

# Retrieving all function names
function_names = pp.get_all_function_names()
print(function_names)  # Output: [first_child, second_child, say_whee, wrapper, my_decorator, parent]

The analogy for this is like using a magic wand. You simply point it at your Python file, and it reveals all the function names hidden in the code! Similarly, for PHP, you would use PHPParser, and for Java, you would use JavaParser, implementing the same straightforward approach.

API Reference

Here’s a brief overview of the supported languages and their respective functions:

  • Python: get_all_function_names, get_all_class_names, etc.
  • PHP: get_all_function_names, get_all_class_names, etc.
  • Java: get_all_class_method_names, get_all_class_names, etc.
  • JavaScript: get_all_function_names, get_all_class_names, etc.
  • C++: get_all_function_names, get_all_class_names, etc.

Extending Tree Hugger

Extending Tree Hugger to accommodate other programming languages or functionalities can be done simply:

  • To add languages, create a new parser class by inheriting from the BaseParser.
  • Queries can be added in a queries.yml file which will help in parsing the source code.

Roadmap

The development roadmap includes expanding documentation and adding parser classes for new languages. Our goal is to make Tree Hugger universal and efficient for all code mining needs.

Troubleshooting

If you encounter issues during installation or setup, here are a few tips:

  • Ensure all required libraries are correctly installed.
  • Check that the environment variables are properly configured.
  • If you get errors related to missing files, revisit the installation steps.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox