github.com/pappewaio/r-stats-c-streamer

An animated time-lapse of how pappewaio/r-stats-c-streamer was built, commit by commit.

Open this visualization on its own page →

Contributors

4

Lines of Code

150

From

2020-04-10

To

2021-02-13

About pappewaio/r-stats-c-streamer

r-stats-c-streamer is a command-line tool for performing statistical transformations on tabular data, particularly genomic datasets like GWAS summary statistics. The tool leverages R's internal C implementations directly, bypassing the overhead of R code to achieve high performance on large datasets. It supports Unix-style pipeline processing, making it well-suited for streaming large files through statistical conversions.

The tool provides core statistical functionality including linear and logistic regression statistics, z-score calculations, p-value conversions, and allele frequency computations. It can replace specific columns with calculated values and handles high-precision p-value calculations in scientific notation, with recent versions fixing precision issues for highly significant p-values that were previously truncated. The software is containerized for HPC environments using Singularity and includes comprehensive documentation and a full test suite to ensure code quality.

The project is written in C and maintained with semantic versioning, currently at version 1.4.3. It's designed for researchers and bioinformaticians working with genomic data who need fast, reliable statistical transformations in pipeline workflows. The development workflow emphasizes testing, requiring the complete test suite to pass before committing changes to ensure functionality across seven unit tests.

Share this video