Workflows Made Easy
The Astro Python SDK is a Python SDK designed for the rapid development of extract, transform, and load workflows in Apache Airflow. This SDK allows you to express your workflows as a set of data dependencies, freeing you from the burden of managing orderings and tasks. It is maintained by Astronomer.
Prerequisites
- Apache Airflow = 2.1.0
Installation
To get started with the Astro Python SDK, follow these installation steps:
- The Astro Python SDK is available at PyPI. Utilize the standard Python installation tools to install it.
- For a cloud-agnostic version of the SDK, run:
pip install astro-sdk-python
pip install astro-sdk-python[amazon,google,snowflake,postgres]
Quickstart
Follow these steps to quickly get your Airflow environment set up:
- Ensure your Airflow environment is correctly set up:
- As of astro-sdk-python release 1.2 and above, AIRFLOW__CORE__ENABLE_XCOM_PICKLING does not need to be enabled.
- For Airflow version 2.5 and astro-sdk-python release 1.3, refer to AstroCustomXcomBackend.
- Create a SQLite database:
- Copy the following workflow into a file named calculate_popular_movies.py and add it to the dags directory:
- Run the example DAG:
- Check the results by running:
export AIRFLOW_HOME=$(pwd)
airflow db init
Note:
export SQL_TABLE_NAME=$(airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}')
sqlite3 $SQL_TABLE_NAME VACUUM;
airflow dags test calculate_popular_movies date -I seconds
sqlite3 $SQL_TABLE_NAME "select * from top_animation;" .exit
You should see entries like:
Toy Story 3 (2010) 8.3
Inside Out (2015) 8.2
How to Train Your Dragon (2010) 8.1
Zootopia (2016) 8.1
How to Train Your Dragon 2 (2014) 7.9
Supported Technologies
The Astro Python SDK supports various file locations and types:
File Location:
- local
- http
- https
- gs (Google Storage)
- gdrive
- s3
- wasb
- wasbs
- azure
- sftp
- ftp
File Type:
- csv
- json
- ndjson
- parquet
- xls
- xlsx
Database:
- postgres
- sqlite
- delta
- bigquery
- snowflake
- redshift
- mssql
- duckdb
- mysql
Available Operations
The following are some key functions available in the SDK:
- load_file: Load a file into a SQL table.
- transform: Apply a SQL select statement to a source table and save the result to a destination table.
- drop_table: Drop a SQL table.
- run_raw_sql: Run SQL statements without handling output.
- append: Insert rows from the source SQL table into the destination SQL table.
- merge: Insert rows from the source SQL and manage conflicts.
- export_file: Export SQL table rows into a file.
- dataframe: Export a SQL table into an in-memory Pandas dataframe.
For a complete operators list, visit the SDK reference documentation.
Troubleshooting
If you encounter issues while using the Astro Python SDK, try these troubleshooting steps:
- Ensure your Airflow version is compatible (2.1.0 or higher).
- Check that your Python installation is up-to-date.
- Verify that your dependencies were installed correctly.
- If the DAG does not run as expected, ensure the correct workflow file is in the proper location within the dags directory.
- If you have further questions or want to connect, For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.