# oreilly-epub A small CLI tool that downloads a book from **O'Reilly Learning** (given a `bookid`) and repackages it into a **valid EPUB**. It mirrors the publisher's layout, fixes resource links (images, etc.) so they work offline, and zips everything into a ready‑to‑read `.epub`. > :warning: You must have a valid O'Reilly Learning subscription and your own > session cookies. This tool is intended for personal/offline use with content > you're authorized to access. ## Features - **Deterministic by `bookid`**: no fuzzy search—just pass a known identifier. - **Authenticated requests** via `cookies.json` (flat key‑value). - **Exploded tree download**: mirrors O'Reilly's `full_path` hierarchy under a local `epub_root/`. - **Valid EPUB packaging**: - Writes the `mimetype` entry first, **uncompressed** (EPUB requirement). - Generates `META-INF/container.xml` pointing to the publisher's OPF. - Zips all assets (HTML, images, CSS, fonts, OPF, NCX, …). - **HTML link rewriting during zip**: - Converts absolute O'Reilly API URLs (e.g., `/api/v2/epubs/.../files/images/foo.jpg`) to **relative paths** inside the EPUB so images/styles render correctly. - Accelerated file downloads via parallelization. ## Installation ### Ready-to-use binaries You can find binaries for major Operating Systems at [GitHub releases](https://github.com/Farzat07/oreilly-epub/releases). \ For portable Linux releases (`musl` or `ARM64`), check the [GitLab releases](https://gitlab.com/Farzat07/oreilly-epub/-/releases). Just plug-in and use. ### Manual build #### Build requirements - **Rust** (stable, 1.75+ recommended) with Cargo. #### Build instructions ```bash git clone cd oreilly-epub cargo build --release ``` The binary will be at `target/release/oreilly-epub`. ## Usage ```txt oreilly-epub [OPTIONS] Arguments: The Book digits ID that you want to download Options: --cookies Path to the cookies.json file. --skip-download If files already downloaded in a previous run. --parallel Number of files to download in parallel. ``` **Example:** ```bash oreilly-epub 9781787782204 --cookies ./cookies.json ``` Requires: - A **`cookies.json`** file for `learning.oreilly.com` (see below). - Network access to O'Reilly's API while running the tool. ### Cookies setup (`cookies.json`) Place a `cookies.json` file in the current working directory or config directory (or pass `--cookies `). \ The file is a **flat JSON object**: cookie name → cookie value. **Example:** ```json { "orm-session": "REDACTED", "orm-cred": "REDACTED", "another_cookie": "value" } ``` > Tip: You can obtain the cookies for `learning.oreilly.com` from your > browser's developer tools by visiting the website and running the command > below in the console. Write down the output to `cookies.json` and keep the > file private. ```js JSON.stringify(document.cookie.split(";").map(c=>c.split("=")).reduce((r,[k,v])=>({...r,[k.trim()]:v?.trim()}),{})) ``` ### Config directory This depends on the platform, as below: |Platform|Value|Example| |--------|-----|-------| |Linux|`$XDG_CONFIG_HOME`/oreilly-epub or `$HOME`/.config/oreilly-epub|/home/alice/.config/oreilly-epub| |macOS|`$HOME`/Library/Application Support/oreilly-epub|/Users/Alice/Library/Application Support/oreilly-epub| |Windows|`{FOLDERID_LocalAppData}`\oreilly-epub|C:\Users\Alice\AppData\Local\oreilly-epub| ## Notes & Limitations - This tool assumes the O'Reilly “files” API includes OPF/NCX and all referenced assets. - Concurrency is not enabled yet; downloads are sequential. ## Roadmap / TODO - [ ] **Logging**: add the ability to perform logging. - [ ] **CONTRIBUTING.md**: add architecture notes & contributor guidelines. - [x] **Robust HTML rewriting**: replace string replacement with real XHTML parsing to update `src`, `href`, and other attributes precisely. - [x] **Stylesheets completeness**: ensure all CSS referenced by chapters is included and linked properly (cross-check chapters endpoint vs files list). - [x] **License**: add copyright notice to each file and specify it in Cargo.toml. - [x] **XDG directories**: use XDG‑compatible defaults for config and the download root. - [x] **Concurrency**: implement parallel downloads with a configurable limit. - [ ] **Progress reporting**: display per‑file and overall progress (bytes and/or file counts). - [x] **Richer metadata**: add metadata such as description to the OPF. - [x] **XML generation**: build `container.xml` using an XML writer instead of raw strings. - [x] **Low‑memory zip**: stream files to the archive in chunks to reduce peak memory. - [x] **CI/CD**: add a basic pipeline (build, fmt, clippy, test, release artifact). - [ ] **Tests**: write actual tests to run in the CI pipeline.