github.com/dib-lab/sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.

Open this visualization on its own page →

Contributors

29

Lines of Code

6,086

From

2016-03-25

To

2021-02-06

About dib-lab/sourmash

Sourmash is a bioinformatics tool for rapidly searching, comparing, and analyzing genomic and metagenomic datasets using k-mer sketching techniques. It implements specialized algorithms including FracMinHash sketching, which enables accurate comparisons between datasets of different sizes and can estimate Average Nucleotide Identity (ANI), as well as a gather command that uses combinatorial k-mer approaches for improved metagenomic profiling. The tool is named as a play on the Mash algorithm combined with a reference to sour mash whiskey.

The project is implemented primarily in Python with performance-critical components written in Rust, and is distributed through multiple package managers including conda-forge, pip, and Debian repositories. It provides both a command-line interface and programmatic Python API for sequence comparison workflows. Sourmash requires Python 3.11 or later and runs on Windows, macOS, and Linux, with core dependencies including numpy, scipy, matplotlib, screed, and cffi.

The project was initially developed at UC Davis's Lab for Data-Intensive Biology and is now maintained by a global community of contributors. Version 4 is the latest major release and includes significant API changes from previous versions. The tool has been formally published in the Journal of Open Source Software and is actively maintained with comprehensive documentation, tutorials, and community support channels available to users.

Share this video