1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
|
# Contributing to oreilly-epub
First off — thank you for your interest in contributing! \
This document explains how to set up your environment, run checks/tests, make
changes, and submit pull requests (PRs).
> **License note:** By contributing, you agree that your contributions will be
> licensed under the project's license (GPLv3). See `LICENSE.txt` in the repo.
## Project goals
This tool downloads an O'Reilly (Safari) book by ID and assembles a valid EPUB
by fetching metadata/chapters/assets, cleaning up XHTML, injecting styles, and
writing a standards-compliant archive. The pipeline includes:
- Authenticated HTTP client using a user-provided `cookies.json`.
- Parallel file downloads with configurable concurrency (default 4, max 8).
- XHTML/OPF processing to ensure EPUB validity (e.g., void tags, stylesheet
links, `dc:description` injection).
- CI/CD pipelines that run `rustfmt`, `clippy`, tests, and build release
artifacts on tag pushes.
## Getting started
### Prerequisites
- **Rust toolchain** (stable) with `cargo`. The CI uses
`dtolnay/rust-toolchain@stable`, so match stable locally.
- For actual end-to-end runs: a **`cookies.json`** file (flat key-value JSON of
cookies for `https://learning.oreilly.com`). Place it in one of:
1. `--cookies <path>` argument
2. `${XDG_CONFIG_HOME}/oreilly-epub/cookies.json`
3. the project's current directory (`./cookies.json`) \
The app will search these locations in this order.
> **Note:** The app is asynchronous (Tokio) and uses `reqwest`. No special
> services are required to build and run locally.
### Repository layout (high-level)
- `src/main.rs` – CLI, argument parsing, orchestration (fetch metadata, pages,
downloads, build EPUB).
- `src/http_client.rs` – authenticated client from `cookies.json`.
- `src/epub.rs` – download pipeline and zip/EPUB creation (mimetype first,
`META-INF/container.xml`, etc.).
- `src/xml.rs` – XHTML/OPF rewriting (void tags, attribute rewrite, stylesheet
linking, description injection).
- `src/models.rs` – data models for APIs (EPUB, chapters, files, pagination).
### Quick start
```bash
# 1) Clone
git clone <REPO_URL>
cd oreilly-epub
# 2) (Optional) Place cookies.json (see above)
# Example:
# cp /path/to/cookies.json .
# 3) Build
cargo build
# 4) Check format & lints locally
cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
# 5) Run unit tests
cargo test --all
# 6) Try it (example book id)
cargo run -- 9781492097039 --parallel 4
# Add --skip-download if you already have cached files
# Add --cookies ./cookies.json to point to a specific file
```
## Development workflow
### Style and linting
- **Formatting:** `cargo fmt --all` (CI runs `cargo fmt --all -- --check`).
- **Linting:** `cargo clippy --all-targets --all-features -- -D warnings` (CI
treats warnings as errors).
> PRs must pass both format and clippy checks locally before you push.
## Before you open a PR
### Commit messages
- Use clear, imperative subject lines: “Fix EPUB container.xml path
resolution”, not “fixed stuff”.
- Wrap body lines at \~72 characters when practical.
- Reference issues with `Fixes #123` / `Closes #123` when appropriate.
### Branch naming
Use short, descriptive branches, e.g.:
- `feat/parallelism-8-limit`
- `fix/xhtml-void-tags`
- `docs/contributing`
### Checklists
Before pushing:
- [ ] `cargo fmt --all` produces no diffs.
- [ ] `cargo clippy --all-targets --all-features -- -D warnings` passes locally.
- [ ] `cargo test --all` is green.
- [ ] If you touched XML/OPF/XHTML handling, tested at least one real book
locally (if possible).
- [ ] Updated documentation, comments, or usage text when applicable.
|