How to Use Datagen CLI for Generating Fake Data

Mar 22, 2024 | Programming

Welcome to the world of data generation! The Datagen CLI is a powerful tool that allows you to produce believable fake data for your applications. Whether it’s for testing, development, or exploration, using JSON, Avro, or SQL schema files, we’ll guide you through the setup and usage of this remarkable tool. So, let’s dive right in!

Installation

Before we can start generating data, we need to install Datagen. You can install it using npm, Docker, or even compile it from source. Here’s how:

  • Using npm: Run the following command in your terminal:
  • npm install -g @materializeinc/datagen
  • Using Docker: Pull the latest Datagen Docker image:
  • docker pull materialize/datagen
  • From Source: Clone the repository and build the project:
  • bash
    git clone https://github.com/MaterializeInc/datagen.git
    cd datagen
    npm install
    npm run build
    npm link

Setting Up Environment Variables

Datagen requires certain environment variables to operate correctly. Create a file named .env and include the following variables:

# Kafka Brokers
export KAFKA_BROKERS=

# For Kafka SASL Authentication
export SASL_USERNAME=
export SASL_PASSWORD=
export SASL_MECHANISM=

# For Kafka SSL Authentication
export SSL_CA_LOCATION=
export SSL_CERT_LOCATION=
export SSL_KEY_LOCATION=

# Schema Registry for Avro
export SCHEMA_REGISTRY_URL=
export SCHEMA_REGISTRY_USERNAME=
export SCHEMA_REGISTRY_PASSWORD=

# PostgreSQL
export POSTGRES_HOST=
export POSTGRES_PORT=
export POSTGRES_DB=
export POSTGRES_USER=
export POSTGRES_PASSWORD=

# MySQL
export MYSQL_HOST=
export MYSQL_PORT=
export MYSQL_DB=
export MYSQL_USER=
export MYSQL_PASSWORD=

Basic Usage

Once you are set up, you can start generating data with the datagen command. Here’s a simple command to showcase its capabilities:

datagen --schema path/to/your/schema.sql --format json --number 100

This command generates 100 records based on the specified SQL schema in JSON format.

Understanding the Code: An Analogy

Imagine you are a chef preparing a multi-course meal for a banquet. The schema is like your recipe book, detailing every ingredient and method needed. The Datagen CLI is your kitchen, equipped with all the tools necessary to whip up magical dishes:

  • The ingredients (data) you use come from the FakerJS API, which allows you to customize and specify exactly what you want.
  • The chef (Datagen) can use various cooking techniques (formats like JSON, Avro, and SQL) to present the meal (output the data) in different appetizing ways.
  • If a dish needs adjustments, the chef can tweak the recipe, just like how you can adjust the schema to generate different types of fake data.

Generating Records with Dependencies

For more intricate scenarios, you can establish relationships between your datasets. This ensures that generated data in one dataset aligns with another, just like pairing wine with the right dish at your banquet:

json
{
  _meta: {
    topic: "my_kafka_topic",
  },
  relationships: [
    {
      topic: "dependent_dataset_topic",
      parent_field: "parent_id_field",
      child_field: "matching_id_field",
      records_per: 2
    }
  ],
  first_field: faker.internet.userName(),
  second_field: faker.datatype.number({min: 100, max: 1000}),
}

Troubleshooting

If you encounter issues while using the Datagen CLI, here are some troubleshooting ideas you can follow:

  • Ensure all paths to the schema and environment files are correctly specified.
  • Check for compatibility and ensure your Kafka, Postgres, or MySQL are running properly.
  • If you cannot see the generated data, try using the --dry-run option to debug without affecting your database.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With the Datagen CLI, you can spice up your data generation processes and build realistic datasets for your applications. By craftily creating schemas and using the FakerJS API, you can ensure that your data resembles what you would encounter in the real world. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox