Astro

May 13, 2024 | Programming

Workflows Made Easy

Python versions License Development Status PyPI downloads Contributors Commit activity pre-commit.ci status CI codecov

The Astro Python SDK is a Python SDK designed for the rapid development of extract, transform, and load workflows in Apache Airflow. This SDK allows you to express your workflows as a set of data dependencies, freeing you from the burden of managing orderings and tasks. It is maintained by Astronomer.

Prerequisites

  • Apache Airflow = 2.1.0

Installation

To get started with the Astro Python SDK, follow these installation steps:

  • The Astro Python SDK is available at PyPI. Utilize the standard Python installation tools to install it.
  • For a cloud-agnostic version of the SDK, run:
  • pip install astro-sdk-python
  • To install dependencies for specific cloud providers, run:
  • pip install astro-sdk-python[amazon,google,snowflake,postgres]

Quickstart

Follow these steps to quickly get your Airflow environment set up:

  1. Ensure your Airflow environment is correctly set up:
  2. export AIRFLOW_HOME=$(pwd)
    airflow db init

    Note:

    • As of astro-sdk-python release 1.2 and above, AIRFLOW__CORE__ENABLE_XCOM_PICKLING does not need to be enabled.
    • For Airflow version 2.5 and astro-sdk-python release 1.3, refer to AstroCustomXcomBackend.
  3. Create a SQLite database:
  4. export SQL_TABLE_NAME=$(airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}')
    sqlite3 $SQL_TABLE_NAME VACUUM;
  5. Copy the following workflow into a file named calculate_popular_movies.py and add it to the dags directory:
  6. calculate_popular_movies.py

  7. Run the example DAG:
  8. airflow dags test calculate_popular_movies date -I seconds
  9. Check the results by running:
  10. sqlite3 $SQL_TABLE_NAME "select * from top_animation;" .exit

    You should see entries like:

    Toy Story 3 (2010) 8.3
    Inside Out (2015) 8.2
    How to Train Your Dragon (2010) 8.1
    Zootopia (2016) 8.1
    How to Train Your Dragon 2 (2014) 7.9

Supported Technologies

The Astro Python SDK supports various file locations and types:

File Location:

  • local
  • http
  • https
  • gs (Google Storage)
  • gdrive
  • s3
  • wasb
  • wasbs
  • azure
  • sftp
  • ftp

File Type:

  • csv
  • json
  • ndjson
  • parquet
  • xls
  • xlsx

Database:

  • postgres
  • sqlite
  • delta
  • bigquery
  • snowflake
  • redshift
  • mssql
  • duckdb
  • mysql

Available Operations

The following are some key functions available in the SDK:

  • load_file: Load a file into a SQL table.
  • transform: Apply a SQL select statement to a source table and save the result to a destination table.
  • drop_table: Drop a SQL table.
  • run_raw_sql: Run SQL statements without handling output.
  • append: Insert rows from the source SQL table into the destination SQL table.
  • merge: Insert rows from the source SQL and manage conflicts.
  • export_file: Export SQL table rows into a file.
  • dataframe: Export a SQL table into an in-memory Pandas dataframe.

For a complete operators list, visit the SDK reference documentation.

Troubleshooting

If you encounter issues while using the Astro Python SDK, try these troubleshooting steps:

  • Ensure your Airflow version is compatible (2.1.0 or higher).
  • Check that your Python installation is up-to-date.
  • Verify that your dependencies were installed correctly.
  • If the DAG does not run as expected, ensure the correct workflow file is in the proper location within the dags directory.
  • If you have further questions or want to connect, For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox