scraping ↗

Scrape bible from multiple resources

Open this visualization on its own page →

Contributors

Lines of Code

271

From

2024-03-03

2024-06-13

About v-bible/scraping

Bible Scraper is a TypeScript project that extracts biblical content from multiple online sources including BibleGateway, Bible.com, and KTcgkpv.org. The tool uses Playwright for web scraping and stores the collected data in either PostgreSQL or SQLite databases, making the biblical content queryable and accessible for further use.

The project captures comprehensive biblical data including verses, footnotes, headings, cross-references, and psalm metadata. It also extracts poetry formatting and can identify words of Jesus highlighted in red letter editions on some sources. The scraped data includes support for multiple Bible versions across different denominations, with particular attention to Vietnamese translations like the KT2011 (Catholic) and BD2011 (Protestant) versions. Additionally, the project can inject full-text search content into SQLite for enhanced search capabilities.

The repository includes detailed documentation of known limitations and discrepancies across different sources, such as missing verses in specific versions and differences in how denominations organize the Old Testament. It also contains supplementary functionality to scrape liturgical resources for Catholic Ordinary Times from catholic-resources.org, making it useful not only for general Bible data aggregation but also for religious education and liturgical applications.

Share this video