github.com/delta-io/delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Open this visualization on its own page →

Contributors

266

Lines of Code

6,866

From

2019-04-22

To

2023-03-16

About delta-io/delta

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture, combining the benefits of data lakes and data warehouses. It works with multiple compute engines including Apache Spark, PrestoDB, Flink, Trino, and Hive, and provides APIs for Scala, Java, Python, Rust, and Ruby. The project implements ACID transactions at scale, allowing data engineers and analysts to work with reliable, queryable data lakes.

The framework provides several core components: a primary Delta repository written in Scala that includes the Spark connector and Delta Standalone library for Java and Scala projects, plus separate repositories for Rust bindings, data sharing capabilities, and Kafka integration. Delta Lake guarantees backward compatibility for all tables and ensures serializability for concurrent reads and writes through its transaction protocol, which specifies how metadata and transactional guarantees are managed.

The project is designed for organizations building modern data platforms that need ACID guarantees, time-travel capabilities, and schema evolution alongside the scalability of data lakes. It has broad integration support across the big data ecosystem and includes connectors allowing tools like Hive and Flink to read from and write to Delta tables. The codebase is built with SBT and requires Java 17 or later for compilation.

Share this video