Add metadata refresh command and jhalfs caching

This commit is contained in:
m00d 2025-10-01 07:04:29 +02:00
parent 3ce470e019
commit 12e6d41e58
5 changed files with 92 additions and 19 deletions

View file

@ -42,8 +42,11 @@ artifacts under `ai/metadata/`:
`ai/metadata/index.json` (use `--compact` for single-line JSON).
- `harvest` fetches a given book page, extracts build metadata, and emits a
schema-compliant JSON skeleton. When direct HTML parsing does not locate the
source tarball, it falls back to the jhalfs `wget-list` data to populate
`source.urls`.
source tarball, it falls back to cached jhalfs manifests to populate
`source.urls` and MD5 checksums.
- `refresh` downloads (or re-downloads with `--force`) the jhalfs manifests
(`wget-list`, `md5sums`) for one or more books and stores them under
`ai/metadata/cache/`.
## Module layout