datajoint-python ↗

Relational Workflows: where database schemas define executable data pipelines.

Open this visualization on its own page →

Contributors

Lines of Code

4,931

From

2011-10-21

2020-12-16

About datajoint/datajoint-python

DataJoint is a Python framework designed for building scientific data pipelines using a novel Relational Workflow Model. In this paradigm, database schemas directly define executable workflows, where each table represents a computational step and foreign keys encode the dependencies between steps. Rather than writing imperative code to orchestrate data processing, users declare what computations should happen and the framework handles the execution logic automatically.

The framework ensures data integrity and reproducibility by treating computation results as immutable and maintaining full provenance tracking throughout the pipeline. It supports multiple database backends including MySQL and PostgreSQL, as well as object storage systems, making it flexible for various scientific computing environments. Tables function as both schema definitions and workflow steps, with parent tables automatically populated before their dependent child tables, creating a natural specification of computational order.

DataJoint targets research software and scientific computing communities, particularly those working with complex data pipelines where reproducibility and metadata management are critical. The project includes DataJoint Elements, which are example pipelines specifically designed for neuroscience applications, and offers comprehensive documentation with tutorials and migration guides for users upgrading from earlier versions.

Share this video