crimson-med/reperio ↗
Created Jan 22, 2026 · View the crimson-med/reperio repository page
Simple, lightweight library to parse and scrap html pages.
Want this for your repo?
Render a free sample of any GitHub repo in seconds.
Contributors
1
Lines of Code
3,274
From
Apr 28, 2022
To
Apr 11, 2025
About crimson-med/reperio
Reperio is a lightweight HTML parsing and scraping library designed for simplicity and performance. It provides a straightforward API for extracting structured data from web pages, supporting both direct string input and remote URLs through a promise-based interface. The library efficiently parses HTML content into organized components including metadata, images, videos, links, scripts, styles, and tables, with benchmarks showing sub-millisecond performance on moderately sized documents.
The library offers both high-level convenience methods and granular parsing functions for developers who need flexibility. Key features include URL extraction across multiple element types, automatic image downloading with deduplication, table conversion to JavaScript objects using headers as property keys, and sentence-level text search capabilities. Each parsed element type returns structured objects with relevant attributes, allowing programmatic access to href values, image sources, video sources, and other metadata.
Reperio is published to npm and follows semantic versioning for updates. The project welcomes contributions and maintains a test suite in the source repository, with clear publishing guidelines for maintainers who need to release new versions.