lpkg/ai/notes.md

46 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Integrating jhalfs Source Metadata
- Goal: reuse jhalfs wget-list and md5sums to populate package `source.urls` and
auto-fill checksums when harvesting metadata for MLFS/BLFS/GLFS packages.
- Data source: `https://anduin.linuxfromscratch.org/` hosts per-release
`wget-list`/`md5sums` files already curated by the jhalfs project.
- Approach:
1. Fetch (and optionally cache under `ai/cache/`) the lists for each book.
2. When harvesting, map `<package>-<version>` against the list to gather all
relevant URLs.
3. Pull matching checksum entries to populate `source.checksums`.
4. Keep the existing HTML scrape for chapter/stage text; jhalfs covers only
sources.
- Benefits: avoids fragile HTML tables, keeps URLs aligned with official build
scripts, and ensures checksums are up-to-date.
# Metadata → Rust Module Strategy
Goal: emit Rust modules under `src/pkgs/by_name` directly from harvested
metadata once MLFS/BLFS/GLFS records are validated.
Outline:
1. **Schema alignment** Ensure harvested JSON carries everything the
`PackageDefinition` constructor expects (source URLs, checksums, build
commands, dependencies, optimisation flags, notes/stage metadata).
2. **Translation layer** Implement a converter (likely in a new module,
e.g. `src/pkgs/generator.rs`) that reads a metadata JSON file and produces a
`ScaffoldRequest` or directly writes the module source via the existing
scaffolder.
3. **Naming/layout** Derive module paths from `package.id` (e.g.
`mlfs/binutils-pass-1``src/pkgs/by_name/bi/binutils/pass_1/mod.rs`) while
preserving the prefix/slug conventions already used by the scaffolder.
4. **CLI integration** Add a subcommand (`metadata_indexer generate`) that
accepts a list of package IDs or a glob, feeds each through the translator,
and optionally stages the resulting Rust files.
5. **Diff safety** Emit modules to a temporary location first, compare
against existing files, and only overwrite when changes are detected; keep a
`--dry-run` mode for review.
6. **Tests/checks** After generation, run `cargo fmt` and `cargo check` to
ensure the new modules compile; optionally add schema fixtures covering edge
cases (variants, multiple URLs, absent checksums).
Open questions:
- How to represent optional post-install steps or multi-phase builds inside the
generated module (additional helper functions vs. raw command arrays).
- Where to store PGO workload hints once the PGO infrastructure is defined.