README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142

# oreilly-epub

A small CLI tool that downloads a book from **O'Reilly Learning** (given a
`bookid`) and repackages it into a **valid EPUB**. It mirrors the publisher's
layout, fixes resource links (images, etc.) so they work offline, and zips
everything into a ready‑to‑read `.epub`.

> :warning: You must have a valid O'Reilly Learning subscription and your own
> session cookies. This tool is intended for personal/offline use with content
> you're authorized to access.

## Features

- **Deterministic by `bookid`**: no fuzzy search—just pass a known identifier.
- **Authenticated requests** via `cookies.json` (flat key‑value).
- **Exploded tree download**: mirrors O'Reilly's `full_path` hierarchy under a
local `epub_root/`.
- **Valid EPUB packaging**:
  - Writes the `mimetype` entry first, **uncompressed** (EPUB requirement).
  - Generates `META-INF/container.xml` pointing to the publisher's OPF.
  - Zips all assets (HTML, images, CSS, fonts, OPF, NCX, …).
- **HTML link rewriting during zip**:
  - Converts absolute O'Reilly API URLs (e.g.,
`/api/v2/epubs/.../files/images/foo.jpg`) to **relative paths** inside the
EPUB so images/styles render correctly.
- Accelerated file downloads via parallelization.

## Installation

### Ready-to-use binaries

You can find binaries for major Operating Systems at
[GitHub releases](https://github.com/Farzat07/oreilly-epub/releases). \
For portable Linux releases (`musl` or `ARM64`), check the
[GitLab releases](https://gitlab.com/Farzat07/oreilly-epub/-/releases).

Just plug-in and use.

### Manual build

#### Build requirements

- **Rust** (stable, 1.75+ recommended) with Cargo.

#### Build instructions

```bash
git clone <REPO_URL>
cd oreilly-epub
cargo build --release
```

The binary will be at `target/release/oreilly-epub`.

## Usage

```txt
oreilly-epub [OPTIONS] <BOOKID>

Arguments:
  <BOOKID>  The Book digits ID that you want to download

Options:
      --cookies <COOKIES>    Path to the cookies.json file.
      --skip-download        If files already downloaded in a previous run.
      --parallel <PARALLEL>  Number of files to download in parallel.
```

**Example:**

```bash
oreilly-epub 9781787782204 --cookies ./cookies.json
```

Requires:

- A **`cookies.json`** file for `learning.oreilly.com` (see below).
- Network access to O'Reilly's API while running the tool.

### Cookies setup (`cookies.json`)

Place a `cookies.json` file in the current working directory or config directory
(or pass `--cookies <path>`). \
The file is a **flat JSON object**: cookie name → cookie value.

**Example:**

```json
{
  "orm-session": "REDACTED",
  "orm-cred": "REDACTED",
  "another_cookie": "value"
}
```

> Tip: You can obtain the cookies for `learning.oreilly.com` from your
> browser's developer tools by visiting the website and running the command
> below in the console. Write down the output to `cookies.json` and keep the
> file private.

```js
JSON.stringify(document.cookie.split(";").map(c=>c.split("=")).reduce((r,[k,v])=>({...r,[k.trim()]:v?.trim()}),{}))
```

### Config directory

This depends on the platform, as below:

|Platform|Value|Example|
|--------|-----|-------|
|Linux|`$XDG_CONFIG_HOME`/oreilly-epub or `$HOME`/.config/oreilly-epub|/home/alice/.config/oreilly-epub|
|macOS|`$HOME`/Library/Application Support/oreilly-epub|/Users/Alice/Library/Application&nbsp;Support/oreilly-epub|
|Windows|`{FOLDERID_LocalAppData}`\oreilly-epub|C:\Users\Alice\AppData\Local\oreilly-epub|

## Notes & Limitations

- This tool assumes the O'Reilly “files” API includes OPF/NCX and all
referenced assets.
- Concurrency is not enabled yet; downloads are sequential.

## Roadmap / TODO

- [ ] **Logging**: add the ability to perform logging.
- [ ] **CONTRIBUTING.md**: add architecture notes & contributor guidelines.
- [x] **Robust HTML rewriting**: replace string replacement with real XHTML
parsing to update `src`, `href`, and other attributes precisely.
- [x] **Stylesheets completeness**: ensure all CSS referenced by chapters is
included and linked properly (cross-check chapters endpoint vs files list).
- [ ] **License**: add copyright notice to each file and specify it in Cargo.toml.
- [x] **XDG directories**: use XDG‑compatible defaults for config and the
download root.
- [x] **Concurrency**: implement parallel downloads with a configurable limit.
- [ ] **Progress reporting**: display per‑file and overall progress (bytes
and/or file counts).
- [x] **Richer metadata**: add metadata such as description to the OPF.
- [x] **XML generation**: build `container.xml` using an XML writer instead of
raw strings.
- [x] **Low‑memory zip**: stream files to the archive in chunks to reduce peak
memory.
- [x] **CI/CD**: add a basic pipeline (build, fmt, clippy, test, release
artifact).
- [ ] **Tests**: write actual tests to run in the CI pipeline.