The TPC-DS (Transaction Processing Performance Council – Decision Support) tools are essential for those looking to benchmark and evaluate the performance of database systems. In this guide, we will break down how to set up the TPC-DS tools on both Linux and macOS, generate data, and create queries, all while troubleshooting common issues along the way.
Getting Started with TPC-DS
The official TPC-DS tools can be found at tpc.org. The version we will focus on is based on v2.10.0, which includes significant modifications such as:
- Compatibility with macOS.
- Fixes for various query template bugs.
- Renaming columns for consistency with specifications.
To see all modifications, diff the files in the master branch with the version branch. For example: master vs v2.10.0.
Setup Instructions
For Linux Users
- Ensure you have the required development tools installed:
- For Ubuntu, run:
- For CentOS/RHEL, run:
- Clone the repository and build the tools:
sudo apt-get install gcc make flex bison byacc git
sudo yum install gcc make flex bison byacc git
git clone https://github.com/gregrahn/tpcds-kit.git
cd tpcds-kit/tools
make OS=LINUX
For macOS Users
- Ensure you have the required development tools installed:
- Clone the repository and build the tools:
xcode-select --install
git clone https://github.com/gregrahn/tpcds-kit.git
cd tpcds-kit/tools
make OS=MACOS
Using the TPC-DS Tools
Data Generation
Data generation is accomplished using the dsdgen tool. For a thorough understanding of the available options, run:
dsdgen -help
Note that if you do not run dsdgen from the tools directory, you must use the option -DISTRIBUTIONS ... with the path to your data index file. Also remember that the output directory specified via the -DIR option must exist prior to execution.
Query Generation
Query generation can be done via the dsqgen tool, which is similar to creating a film script based on a template. You provide the tool with a script (template), and it generates numerous variations (queries) based on that template.
The command below can be used to generate all 99 queries numerically ordered with a scale factor of 10TB:
dsqgen -DIRECTORY ../query_templates -INPUT ../query_templates/templates.lst -VERBOSE Y -QUALIFY Y -SCALE 10000 -DIALECT netezza -OUTPUT_DIR tmp
Troubleshooting Tips
Encountering issues during setup or execution? Here are some common troubleshooting ideas:
- Ensure that all necessary packages are installed based on your operating system.
- If you’re having issues with query generation, double-check that your output directory exists and the template paths are correct.
- For specific error messages, refer to the GitHub issues pages linked in the modifications for potential solutions.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

