2.5 KiB
2.5 KiB
Integrating jhalfs Source Metadata
- Goal: reuse jhalfs wget-list and md5sums to populate package
source.urlsand auto-fill checksums when harvesting metadata for MLFS/BLFS/GLFS packages. - Data source:
https://anduin.linuxfromscratch.org/hosts per-releasewget-list/md5sumsfiles already curated by the jhalfs project. - Approach:
- Fetch (and optionally cache under
ai/cache/) the lists for each book. - When harvesting, map
<package>-<version>against the list to gather all relevant URLs. - Pull matching checksum entries to populate
source.checksums. - Keep the existing HTML scrape for chapter/stage text; jhalfs covers only sources.
- Fetch (and optionally cache under
- Benefits: avoids fragile HTML tables, keeps URLs aligned with official build scripts, and ensures checksums are up-to-date.
Metadata → Rust Module Strategy
Goal: emit Rust modules under src/pkgs/by_name directly from harvested
metadata once MLFS/BLFS/GLFS records are validated.
Outline:
- Schema alignment – Ensure harvested JSON carries everything the
PackageDefinitionconstructor expects (source URLs, checksums, build commands, dependencies, optimisation flags, notes/stage metadata). - Translation layer – Implement a converter (likely in a new module,
e.g.
src/pkgs/generator.rs) that reads a metadata JSON file and produces aScaffoldRequestor directly writes the module source via the existing scaffolder. - Naming/layout – Derive module paths from
package.id(e.g.mlfs/binutils-pass-1→src/pkgs/by_name/bi/binutils/pass_1/mod.rs) while preserving the prefix/slug conventions already used by the scaffolder. - CLI integration – Add a subcommand (
metadata_indexer generate) that accepts a list of package IDs or a glob, feeds each through the translator, and optionally stages the resulting Rust files. - Diff safety – Emit modules to a temporary location first, compare
against existing files, and only overwrite when changes are detected; keep a
--dry-runmode for review. - Tests/checks – After generation, run
cargo fmtandcargo checkto ensure the new modules compile; optionally add schema fixtures covering edge cases (variants, multiple URLs, absent checksums).
Open questions:
- How to represent optional post-install steps or multi-phase builds inside the generated module (additional helper functions vs. raw command arrays).
- Where to store PGO workload hints once the PGO infrastructure is defined.