Skip to contents

Project Gutenberg provides metadata in RDF format for each of their works. This function downloads the file for the requested work and transforms it into a tidy data frame.

Usage

pg_metadata(pgid)

Arguments

pgid

Project Gutenberg ID (i.e. https://www.gutenberg.org/ebooks/<PGID>), a single integer value

Value

Returns a tibble of RDF triples where subject, predicate, and object have been renamed to key, name, and value, discarding the RDF structure in favor of a simpler "long" format that can be aggregated and reshaped easily through tidyverse conventions. Note that recursive joins may be required to capture nested attributes.

Not for scraping!

Do not use this function to scrape metadata in bulk. Project Gutenberg provides offline catalog downloads for that purpose. Be a courteous data scientist and take steps to avoid calling this function repeatedly in your code -- particularly during development. Consider using the memoise package in your code and/or caching in your Rmarkdown and Quarto documents.

References

See A tidyverse lover's intro to RDF for a great introduction to the RDF format.