oreilly-epub
A small CLI tool that downloads a book from O'Reilly Learning (given a
bookid) and repackages it into a valid EPUB. It mirrors the publisher's
layout, fixes resource links (images, etc.) so they work offline, and zips
everything into a ready‑to‑read .epub.
:warning: You must have a valid O'Reilly Learning subscription and your own session cookies. This tool is intended for personal/offline use with content you're authorized to access.
Features
- Deterministic by
bookid: no fuzzy search—just pass a known identifier. - Authenticated requests via
cookies.json(flat key‑value). - Exploded tree download: mirrors O'Reilly's
full_pathhierarchy under a localepub_root/. - Valid EPUB packaging:
- Writes the
mimetypeentry first, uncompressed (EPUB requirement). - Generates
META-INF/container.xmlpointing to the publisher's OPF. - Zips all assets (HTML, images, CSS, fonts, OPF, NCX, …).
- HTML link rewriting during zip:
- Converts absolute O'Reilly API URLs (e.g.,
/api/v2/epubs/.../files/images/foo.jpg) to relative paths inside the EPUB so images/styles render correctly. - Accelerated file downloads via parallelization.
Installation
Ready-to-use binaries
You can find binaries for major Operating Systems at
GitHub releases. \
For portable Linux releases (musl or ARM64), check the
GitLab releases.
Just plug-in and use.
Manual build
Build requirements
- Rust (stable, 1.75+ recommended) with Cargo.
Build instructions
git clone <REPO_URL>
cd oreilly-epub
cargo build --release
The binary will be at target/release/oreilly-epub.
Usage
oreilly-epub [OPTIONS] <BOOKID>
Arguments:
<BOOKID> The Book digits ID that you want to download
Options:
--cookies <COOKIES> Path to the cookies.json file.
--skip-download If files already downloaded in a previous run.
--parallel <PARALLEL> Number of files to download in parallel.
Example:
oreilly-epub 9781787782204 --cookies ./cookies.json
Requires:
- A
cookies.jsonfile forlearning.oreilly.com(see below). - Network access to O'Reilly's API while running the tool.
Cookies setup (cookies.json)
Place a cookies.json file in the current working directory or config directory
(or pass --cookies <path>). \
The file is a flat JSON object: cookie name → cookie value.
Example:
{
"orm-session": "REDACTED",
"orm-cred": "REDACTED",
"another_cookie": "value"
}
Tip: You can obtain the cookies for
learning.oreilly.comfrom your browser's developer tools by visiting the website and running the command below in the console. Write down the output tocookies.jsonand keep the file private.
JSON.stringify(document.cookie.split(";").map(c=>c.split("=")).reduce((r,[k,v])=>({...r,[k.trim()]:v?.trim()}),{}))
Config directory
This depends on the platform, as below:
| Platform | Value | Example |
|---|---|---|
| Linux | $XDG_CONFIG_HOME/oreilly-epub or $HOME/.config/oreilly-epub |
/home/alice/.config/oreilly-epub |
| macOS | $HOME/Library/Application Support/oreilly-epub |
/Users/Alice/Library/Application Support/oreilly-epub |
| Windows | {FOLDERID_LocalAppData}\oreilly-epub |
C:\Users\Alice\AppData\Local\oreilly-epub |
Notes & Limitations
- This tool assumes the O'Reilly “files” API includes OPF/NCX and all referenced assets.
- Concurrency is not enabled yet; downloads are sequential.
Roadmap / TODO
- [ ] Logging: add the ability to perform logging.
- [ ] CONTRIBUTING.md: add architecture notes & contributor guidelines.
- [x] Robust HTML rewriting: replace string replacement with real XHTML
parsing to update
src,href, and other attributes precisely. - [x] Stylesheets completeness: ensure all CSS referenced by chapters is included and linked properly (cross-check chapters endpoint vs files list).
- [ ] License: add copyright notice to each file and specify it in Cargo.toml.
- [x] XDG directories: use XDG‑compatible defaults for config and the download root.
- [x] Concurrency: implement parallel downloads with a configurable limit.
- [ ] Progress reporting: display per‑file and overall progress (bytes and/or file counts).
- [x] Richer metadata: add metadata such as description to the OPF.
- [x] XML generation: build
container.xmlusing an XML writer instead of raw strings. - [x] Low‑memory zip: stream files to the archive in chunks to reduce peak memory.
- [x] CI/CD: add a basic pipeline (build, fmt, clippy, test, release artifact).
- [ ] Tests: write actual tests to run in the CI pipeline.
