lpkg/ai/notes.md

5 KiB
Raw Blame History

Integrating jhalfs Source Metadata

  • Goal: reuse jhalfs wget-list and md5sums to populate package source.urls and auto-fill checksums when harvesting metadata for MLFS/BLFS/GLFS packages.
  • Data source: https://anduin.linuxfromscratch.org/ hosts per-release wget-list/md5sums files already curated by the jhalfs project.
  • Approach:
    1. Fetch (and optionally cache under ai/cache/) the lists for each book.
    2. When harvesting, map <package>-<version> against the list to gather all relevant URLs.
    3. Pull matching checksum entries to populate source.checksums.
    4. Keep the existing HTML scrape for chapter/stage text; jhalfs covers only sources.
  • Benefits: avoids fragile HTML tables, keeps URLs aligned with official build scripts, and ensures checksums are up-to-date.

Metadata → Rust Module Strategy

Goal: emit Rust modules under src/pkgs/by_name directly from harvested metadata once MLFS/BLFS/GLFS records are validated.

Outline:

  1. Schema alignment Ensure harvested JSON carries everything the PackageDefinition constructor expects (source URLs, checksums, build commands, dependencies, optimisation flags, notes/stage metadata).
  2. Translation layer Implement a converter (likely in a new module, e.g. src/pkgs/generator.rs) that reads a metadata JSON file and produces a ScaffoldRequest or directly writes the module source via the existing scaffolder.
  3. Naming/layout Derive module paths from package.id (e.g. mlfs/binutils-pass-1src/pkgs/by_name/bi/binutils/pass_1/mod.rs) while preserving the prefix/slug conventions already used by the scaffolder.
  4. CLI integration Add a subcommand (metadata_indexer generate) that accepts a list of package IDs or a glob, feeds each through the translator, and optionally stages the resulting Rust files.
  5. Diff safety Emit modules to a temporary location first, compare against existing files, and only overwrite when changes are detected; keep a --dry-run mode for review.
  6. Tests/checks After generation, run cargo fmt and cargo check to ensure the new modules compile; optionally add schema fixtures covering edge cases (variants, multiple URLs, absent checksums).

Open questions:

  • How to represent optional post-install steps or multi-phase builds inside the generated module (additional helper functions vs. raw command arrays).
  • Where to store PGO workload hints once the PGO infrastructure is defined.

Lightweight Networking Rewrite

  • Motivation: remove heavy async stacks (tokio + reqwest) from the default feature set to keep clean builds fast and reduce binary size.
  • HTTP stack baseline: ureq (blocking, TLS via rustls, small dependency footprint) plus scraper for DOM parsing.
  • Migration checklist:
    • Replace reqwest usage in src/html.rs, md5_utils.rs, wget_list.rs, mirrors.rs, and the ingest pipelines.
    • Rework binutils cross toolchain workflow to operate synchronously, eliminating tokio runtime/bootstrap.
    • Drop tokio and reqwest from Cargo.toml once TUI workflows stop using tracing instrumentation hooks that pulled them in transitively.
    • Audit for remaining tracing dependencies and migrate to the lightweight logging facade (log + env_logger or custom adapter) for non-TUI code.
  • Follow-up ideas:
    • Provide feature flag full-net that re-enables async clients when needed for high-concurrency mirror probing.
    • Benchmark ureq vs reqwest on metadata_indexer harvest to ensure we dont regress throughput noticeably.

README Generation Framework (Markdown RFC)

  • Goal: author the project README in Rust, using a small domain-specific builder that outputs GitHub-flavoured Markdown (GFM) from structured sections.
  • Design sketch:
    • New crate/workspace member readme_builder under tools/ exposing a fluent API (Doc::new().section("Intro", |s| ...)).
    • Source-of-truth lives in tools/readme/src/main.rs; running cargo run -p readme_builder writes to README.md.
    • Provide reusable primitives: Heading, Paragraph, CodeBlock, Table::builder(), Callout::note("..."), Badge::docsrs(), etc.
    • Keep rendering deterministic (sorted sections, stable wrapping) so diffs remain reviewable.
  • Tasks:
    • Scaffold tools/readme crate with CLI that emits to stdout or specified path (--output README.md).
    • Model README sections as enums/structs with Display impls to enforce consistency.
    • Port current README structure into builder code, annotate with inline comments describing regeneration steps.
    • Add make readme (or cargo xtask readme) to rebuild documentation as part of release workflow.
    • Document in CONTRIBUTING how to edit the Rust source instead of the raw Markdown.
  • Stretch goals:
    • Emit additional artefacts (e.g., docs/CHANGELOG.md) from the same source modules.
    • Allow embedding generated tables from Cargo metadata (dependency stats, feature lists).