dplyr ↗

dplyr: A grammar of data manipulation

Open this visualization on its own page →

Contributors

283

Lines of Code

29,119

From

2012-10-28

2023-03-16

About hadley/dplyr

dplyr is an R package that provides a grammar of data manipulation through a set of consistent, intuitive verbs. The core functions include mutate() for creating new variables, select() for choosing columns, filter() for subsetting rows by condition, summarise() for aggregating data, and arrange() for reordering rows. These operations work seamlessly with group_by() to perform grouped calculations, allowing users to write readable and composable data transformation pipelines.

Beyond working with standard R data frames and tibbles, dplyr is designed as an interface to multiple computational backends, enabling efficient data manipulation across different storage systems. The package integrates with arrow for large datasets and cloud storage, dbplyr for relational databases (translating dplyr code to SQL), dtplyr for high-performance operations on in-memory data using data.table, duckplyr for DuckDB queries, and sparklyr for Apache Spark. This backend approach allows analysts to write consistent dplyr code regardless of where their data is stored or how large it is.

The package is part of the tidyverse ecosystem and is widely used in the R data science community. It serves both beginners learning data transformation and experienced analysts working with large-scale datasets. The project emphasizes accessibility with extensive documentation and educational resources, including integration with R for Data Science textbooks.

Share this video