github.com/apache/airflow ↗
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open this visualization on its own page →
Contributors
1645
Lines of Code
55,804
From
2014-10-06
To
2020-12-31
About apache/airflow
Apache Airflow is a workflow orchestration platform that lets users programmatically author, schedule, and monitor data pipelines and automation jobs. Workflows are defined as code using directed acyclic graphs (DAGs), making them maintainable, versionable, testable, and collaborative. The platform includes a scheduler that executes tasks across distributed workers while respecting dependencies, command-line utilities for complex operations, and a web interface for visualization and monitoring.
The project is designed for workflows that are mostly static and slowly changing, making it well-suited for batch data processing, ETL/ELT pipelines, and machine learning workflows. Airflow emphasizes idempotent tasks and recommends delegating high-volume data operations to specialized external services rather than passing large datasets between tasks. While not a streaming solution, it's commonly used for processing real-time data in batches. The platform is highly extensible with built-in operators and supports rich customization through the Jinja templating engine.
Apache Airflow is a mature Apache Software Foundation project with broad industry adoption. It supports Python 3.10 through 3.14, multiple databases including PostgreSQL, MySQL, and SQLite, and Kubernetes orchestration. The project maintains multiple release branches with clear lifecycle policies, provides official Docker images built on Debian, and distributes packages via PyPI. Version 2 is in limited maintenance with EOL in April 2026, while version 3 entered general maintenance in April 2025, with the project following strict semantic versioning for core features and independent versioning for provider packages.