Live with Astronomer

Data-Aware Scheduling with the Astro Python SDK

Watch On Demand

Hosted By

  • Benji Lampel Benji Lampel Ecosystem Engineer
  • Tamara Fingerlin Tamara Fingerlin Developer Advocate

In Airflow 2.4 the Datasets feature was introduced.

This allows data-aware scheduling:

  • The DAG author (you) can tell Airflow that a task is updating a Dataset: outlets=[Dataset(“s3://my_bucket”)]
  • DAGs can be scheduled to run on these updates to Datasets: schedule=[Dataset(“s3://my_bucket”)]

You can find all the needed resources in this Github repository.

Astronomer Apache Airflow Fundamentals Certification badge

Get Apache Airflow Certified

If you want to learn more about how to get started with Airflow, you can join the thousands of other data engineers who have received the Astronomer Certification for Apache Airflow Fundamentals. This exam assesses an understanding of the basics of the Airflow architecture and the ability to create simple data pipelines for scheduling and monitoring tasks.

Learn More About Certification