How to Use Easy SQL for Your ETL Development

Jun 17, 2023 | Programming

Welcome to the world of Easy SQL, a tool designed to simplify the ETL (Extract, Transform, Load) process using SQL in an intuitive and imperative manner. In this guide, we’ll explore how to get started with Easy SQL, create your first ETL, troubleshoot common issues, and even dive into debugging and contributing to the tool.

What is Easy SQL?

Easy SQL allows you to develop your ETL processes using a streamlined, SQL-compatible syntax. It supports various SQL engines, including SparkSQL, PostgreSQL, Clickhouse, and many others, enabling you to run SQL commands in sequence effortlessly. Think of Easy SQL as a magic conduit that channels your SQL queries into a variety of data engines, making your data transformation process smooth and efficient.

How to Install Easy SQL

Installing Easy SQL is a breeze! You can easily install it via pip. Here’s how:

python3 -m pip install easy-sql-easy-sql[extra,extra]

When installing, choose any extras you may need, such as:

  • cli
  • linter
  • spark
  • pg (PostgreSQL)
  • clickhouse

For Flink backend, ensure you install Apache Flink dependencies manually:

python3 -m pip install apache-flink

Your First ETL with Easy SQL

Let’s start by creating your first ETL with Easy SQL. Here’s a brief guide for each backend.

1. For Spark Backend

To install Easy SQL with Spark as the backend:

python3 -m pip install easy-sql-easy-sql[spark,cli]

Create a file named sample_etl.spark.sql and populate it with the following content:

-- prepare-sql: drop database if exists sample cascade
-- prepare-sql: create database sample
-- prepare-sql: create table sample.test as select 1 as id, 1 as val
-- target=variables
select true as __create_output_table__ 
-- target=variables
select 1 as a 
-- target=log.a
select $a as a 
-- target=log.test_log
select 1 as some_log 
-- target=check.should_equal
select 1 as actual, 1 as expected 
-- target=temp.result
select $a as id, $a + 1 as val union all select id, val from sample.test 
-- target=output.sample.result
select * from result 
-- target=log.sample_result
select * from sample.result

Run it with this command:

bash -c $(python3 -m easy_sql.data_process -f sample_etl.spark.sql -p)

2. For PostgreSQL Backend

First, run a PostgreSQL instance using Docker:

docker run -d --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=123456 postgres

Create a file named sample_etl.postgres.sql with the content from the test file. Ensure you have installed the corresponding backend:

python3 -m pip install easy-sql-easy-sql[cli,pg]

To run it, use:

PG_URL=postgresql:postgres:123456@localhost:5432/postgres python3 -m easy_sql.data_process -f sample_etl.postgres.sql

3. For Clickhouse Backend

Start a Clickhouse instance in Docker:

docker run -d --name clickhouse -p 9000:9000 yandex/clickhouse-server:20.12.5

Then, create a file named sample_etl.clickhouse.sql with content from the test file. Install the required backend:

python3 -m pip install easy-sql-easy-sql[cli,clickhouse]

Run the Clickhouse ETL with:

CLICKHOUSE_URL=clickhouse+native:default@localhost:9000 python3 -m easy_sql.data_process -f sample_etl.clickhouse.sql

Troubleshooting Tips

If you encounter issues during installation or while running an ETL, check the following:

  • Ensure Docker is running if using database instances.
  • Verify you have the correct dependencies installed for each backend.
  • Look for typos in your SQL scripts.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Debugging ETLs

You can debug ETLs more effectively using Jupyter. Here is how:

  1. Install Jupyter:
  2. python3 -m pip install jupyterlab
  3. Create a debugger.py file with contents similar to the sample:
  4. from typing import Dict, Any
    def create_debugger(sql_file_path: str, vars: Dict[str, Any] = None, funcs: Dict[str, Any] = None):
        from pyspark.sql import SparkSession
        from easy_sql.sql_processor.backend import SparkBackend
        from easy_sql.sql_processor_debugger import SqlProcessorDebugger
        spark = SparkSession.builder.enableHiveSupport().getOrCreate()
        backend = SparkBackend(spark)
        debugger = SqlProcessorDebugger(sql_file_path, backend, vars, funcs)
        return debugger
  5. Create a test.sql file using the samples provided.
  6. Start Jupyter Lab:
  7. jupyter lab
  8. Begin debugging your ETL!

Contributing to Easy SQL

Your contributions can make Easy SQL an even better tool! Please submit any Pull Requests (PRs) if you have enhancements or bug fixes to share with the community.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox