The TPC-H benchmark kit is a powerful tool used primarily for analyzing the performance of database systems through the execution of complex queries on large datasets. In this article, we will walk you through the steps needed to set up and utilize the modified TPC-H benchmark kit, ensuring that you’re equipped to assess your database’s capabilities effectively.
Modifications Overview
The TPC-H kit has been enhanced with several modifications including:
- Modification of dbgen to avoid printing trailing delimiters.
- An option for dbgen to output to stdout.
- Compile support for macOS environments.
- Support for PostgreSQL’s LIMIT N in query generation.
- Adjustments made to the Makefile defaults.
Setup Instructions
Follow these guidelines to set up the TPC-H benchmark kit on either Linux or macOS platforms.
Linux Setup
- Ensure that the required development tools are installed on your system. Run the following command based on your Linux distribution:
- For Ubuntu:
sudo apt-get install git make gcc - For CentOS/RHEL:
sudo yum install git make gcc - Next, clone the repository and build the tools:
git clone https://github.com/gregrahntpch-kit.git
cd tpch-kit
dbgen
make MACHINE=LINUX DATABASE=POSTGRESQL
macOS Setup
- First, ensure the required development tools are installed:
- Then, clone the repository and build the tools:
xcode-select --install
git clone https://github.com/gregrahntpch-kit.git
cd tpch-kit
dbgen
make MACHINE=MACOS DATABASE=POSTGRESQL
Using the TPC-H Tools
Once the setup is complete, you will need to configure your environment to utilize the TPC-H tools effectively.
Environment Configuration
Set the necessary environment variables:
export DSS_CONFIG=...tpch-kitdbgen
export DSS_QUERY=$DSS_CONFIG/queries
export DSS_PATH=path-to-dir-for-output-files
SQL Dialects
Refer to the Makefile for valid DATABASE values. You can find details for each SQL dialect in tpcd.h. Adjust the query templates in tpch-kit/dbgen/queries as needed.
Data Generation
Data generation is achieved through the dbgen tool. For all available options, check the dbgen -h command. Use the DSS_PATH variable to specify your desired output location.
Query Generation
For generating queries, employ the qgen utility. To get the list of all available options, utilize qgen -h. Here’s how you can generate all 22 queries in numerical order for the 1GB scale factor:
qgen -v -c -d -s 1 tpch-stream.sql
If you need to generate one query per file for a scale factor of 3000 (3TB), use the following loop:
for ((i=1;i<=22;i++)); do
qgen -v -c -s 3000 $i tmpsf3000tpch-q$i.sql
done
Troubleshooting
If you encounter issues during setup or usage, consider the following troubleshooting steps:
- Ensure you have all the necessary development tools installed.
- Check that your environment variables are set correctly.
- Verify that you are in the correct directory when running commands.
- Review the error messages carefully, as they often indicate what went wrong.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

