How to Set Up the TPC-DS Tools

Jul 23, 2022 | Programming

The TPC-DS (Transaction Processing Performance Council – Decision Support) tools are essential for those looking to benchmark and evaluate the performance of database systems. In this guide, we will break down how to set up the TPC-DS tools on both Linux and macOS, generate data, and create queries, all while troubleshooting common issues along the way.

Getting Started with TPC-DS

The official TPC-DS tools can be found at tpc.org. The version we will focus on is based on v2.10.0, which includes significant modifications such as:

  • Compatibility with macOS.
  • Fixes for various query template bugs.
  • Renaming columns for consistency with specifications.

To see all modifications, diff the files in the master branch with the version branch. For example: master vs v2.10.0.

Setup Instructions

For Linux Users

  1. Ensure you have the required development tools installed:
  2. For Ubuntu, run:
  3. sudo apt-get install gcc make flex bison byacc git
  4. For CentOS/RHEL, run:
  5. sudo yum install gcc make flex bison byacc git
  6. Clone the repository and build the tools:
  7. git clone https://github.com/gregrahn/tpcds-kit.git
    cd tpcds-kit/tools
    make OS=LINUX

For macOS Users

  1. Ensure you have the required development tools installed:
  2. xcode-select --install
  3. Clone the repository and build the tools:
  4. git clone https://github.com/gregrahn/tpcds-kit.git
    cd tpcds-kit/tools
    make OS=MACOS

Using the TPC-DS Tools

Data Generation

Data generation is accomplished using the dsdgen tool. For a thorough understanding of the available options, run:

dsdgen -help

Note that if you do not run dsdgen from the tools directory, you must use the option -DISTRIBUTIONS ... with the path to your data index file. Also remember that the output directory specified via the -DIR option must exist prior to execution.

Query Generation

Query generation can be done via the dsqgen tool, which is similar to creating a film script based on a template. You provide the tool with a script (template), and it generates numerous variations (queries) based on that template.

The command below can be used to generate all 99 queries numerically ordered with a scale factor of 10TB:

dsqgen -DIRECTORY ../query_templates -INPUT ../query_templates/templates.lst -VERBOSE Y -QUALIFY Y -SCALE 10000 -DIALECT netezza -OUTPUT_DIR tmp

Troubleshooting Tips

Encountering issues during setup or execution? Here are some common troubleshooting ideas:

  • Ensure that all necessary packages are installed based on your operating system.
  • If you’re having issues with query generation, double-check that your output directory exists and the template paths are correct.
  • For specific error messages, refer to the GitHub issues pages linked in the modifications for potential solutions.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox