Skip to main content
← Back to list
01Issue
FeatureShippedSwamp CLI
Assigneesstack72

#231 Implement W2: Lifecycle services own the catalog write (extension catalog rearchitecture)

Opened by stack72 · 5/4/2026· Shipped 5/5/2026

Problem

W1a (#1292, shipped) and W1b (#1295, shipped — see #211 / #223) put the foundation in place: domain aggregate, ExtensionRepository, diff-based transactional saves, I-Repo-1 cross-aggregate uniqueness invariant. But the keystone is plumbing only — no user-visible improvements have shipped yet.

swamp-club#201 ("Catalog retains stale source-file entries when extension version drops a file") remains open because extension rm doesn't touch the catalog at all (empty grep in the audit). Cross-extension type collisions are silent first-wins.

W2 is where the user-visible payoff arrives. Lifecycle services take ownership of catalog writes end-to-end, closing #201 and activating the I-Repo-1 invariant for users.

Full architectural context: design/extension-rearchitecture.md ("W2 — Lifecycle services own the catalog write" section) — referenced from #211.

Scope

Introduce three application services that orchestrate filesystem + lockfile + catalog mutations as one unit-of-work each.

InstallExtensionService.execute(args)

Replaces the catalog-write surface of the current install path in pull.ts. Builds an Extension aggregate from filesystem state + manifest + lockfile entry, calls repository.save(extension). Diff-save inserts the rows; I-Repo-1 fires (cross-extension type collision → DuplicateTypeError + ROLLBACK).

RemoveExtensionService.execute(name)

This is the swamp-club#201 fix. Loads via repository.loadByName(name), calls extension.tombstoneAll(), calls repository.save(extension). Diff-save deletes every row owned by that extension. Filesystem cleanup of the pulled-extensions tree happens in the same unit-of-work. Today's rm.ts does not touch the catalog at all — this PR fixes that omission.

UpgradeExtensionService.execute(name, newVersion)

The atomic-transition pattern locked in by W1b's I-Repo-1 design: composes saveAll([vN.tombstoneAll(), vN+1]) as a single call so the global uniqueness invariant evaluates against the post-save state where only vN+1 holds the types. Without this, the post-save invariant fails on the first save and upgrade is structurally impossible.

Unit-of-work shape

Each service runs with deterministic ordering: filesystem mutations first, lockfile mutation and catalog save last. A crashed mid-flight install leaves the lockfile unchanged AND the catalog unchanged, so the next loader pass reconciles via disk walk and recovers without operator intervention. This pattern is what makes W2's behavior changes safe to ship under auto-ship-on-merge.

Pre-work decisions to pin in the PR description

  1. Service location. Recommend src/libswamp/extensions/ — these are application services orchestrating domain + infrastructure, not domain logic. Alongside existing pull.ts / rm.ts.
  2. Transaction ordering inside each service: filesystem → lockfile → catalog. Lockfile + catalog are the "commit"; anything before is reversible by the next disk walk via the existing findStaleFiles fallback path.
  3. DuplicateTypeError user surface. When I-Repo-1 fires on saveAll, the user sees a structured error naming both source paths; install/upgrade fails cleanly; filesystem changes get rolled back via the unit-of-work. Pin the exact error shape and CLI exit code.
  4. Migration of existing modules. pull.ts and rm.ts keep their CLI command shapes; their bodies become thin wrappers over the lifecycle services. Wholesale move, not parallel implementations.

Out of scope (deferred to later workstreams)

  • ReconcileFromDisk strategic service / freshness-as-aggregate-query → W3 (W2 uses the existing findStaleFiles deletion-sweep as a fallback safety net; the strategic reconcile is W3)
  • legacyStore escape hatch removal → W4 (still in transitional use)
  • Loader unification (KindAdapter) and the #214 ENOENT-fallback parity fix → W4
  • Per-fingerprint import URLs and subprocess test harness → W5
  • swamp doctor extensions rendering aggregate state → W6

Success criteria

  • swamp-club#201 fixed: extension rm prunes catalog rows. Round-trip test (install → rm → catalog has zero rows for that extension) at the lifecycle layer (distinct from W1b's repository-layer reproducer).
  • Cross-extension DuplicateTypeError fires at install time with both source paths named. Today this is silent first-wins.
  • Mid-flight crash recovery: kill the process during install, restart, system reconciles via disk walk on next loader pass. No operator intervention.
  • Upgrade atomicity: pull v1, pull v2 of the same extension, catalog ends with only v2's (kind, type) rows + v1's bundle files cleaned up.
  • All existing tests pass on Linux + macOS (Windows not a merge gate per W1 precedent).
  • Auto-ship-on-merge readiness verified via diversity-matrix soak (mirror W1b's pre-merge gates).

Suggested test additions

  • Lifecycle round-trip: install → assert catalog rows present → rm → assert rows gone (#201 reproducer at lifecycle layer).
  • Upgrade atomicity: pull v1, pull v2, assert only v2's rows in catalog; assert v1's bundle files cleaned up.
  • DuplicateTypeError at install time: install extension A claiming @scope/foo, install extension B claiming the same type, assert second install fails with DuplicateTypeError naming both source paths and exits non-zero.
  • Mid-flight crash simulation: fault-inject the catalog save to throw partway through, assert filesystem state reverted (or recoverable), lockfile unchanged, next install attempt succeeds.
  • Loader-side reconciliation: simulate a crashed install (filesystem files present, no lockfile/catalog entry), assert next loader pass surfaces the orphan via the existing findStaleFiles fallback.

Auto-ship-on-merge constraint

This PR ships to users on merge to main. Pre-merge gates (mirror W1b's checklist):

  • CI green (all new + existing tests + type-check + lint + fmt)
  • Author smoke on real repo: install + rm round-trip works; catalog state matches expectations; #201 visibly closed
  • Reviewer smoke on a different real repo
  • Soak across diversity matrix (~3 days): Linux + macOS, repos with mixed local/pulled extensions, repos with version transitions
  • No unexplained DuplicateTypeError firings on soak repos
  • Forward-only revert documented (state precedent: deleting _extension_catalog.db recovers if needed)

The cost of escape is "every user on next pull" — bar is "would I let this go to all users tomorrow."

Push-back encouraged

If the design doesn't fit the ground, surface it before implementation. Specific watch list:

  • The unit-of-work ordering assumes filesystem mutations are idempotent enough that the disk-walk fallback can recover from a partial state. Verify this assumption against the actual pull.ts filesystem operations before locking in the ordering.
  • findStaleFiles is being repurposed as a fallback safety net (it was the primary mechanism). Make sure the existing call paths still work as fallback semantics.
  • The W3 boundary: anything that smells like "reconcile policy" belongs in W3, not W2. Fault-inject scenarios where W2 might be tempted to add reconcile logic and confirm the disk-walk fallback covers them.

References

  • Predecessors: #211 (W1 tracking), #223 (W1b implementation, shipped as #1295)
  • Related bug: #201 (the user-visible bug this closes)
  • Design doc: design/extension-rearchitecture.md
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 11 MOREREVIEW+ 3 MOREPR_MERGEDSHIPPED

Shipped

5/5/2026, 6:28:56 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack725/4/2026, 9:13:02 PM
Editable. Press Enter to edit.

stack72 commented 5/4/2026, 10:32:03 PM

Phase A audit complete — go/no-go: SPLIT

W2's Phase A pre-implementation audit per the approved plan v4. Numbers:

(a) pull.ts FS idempotency: PASS. Deno.mkdir({recursive: true}), copyDir (overwrites on retry), extractTarGz (fresh tmpDir per call, copy-from-tmp post-extraction), atomicWriteTextFile (tmp-rename), pruneOrphanFiles (NotFound-tolerant) all idempotent. Disk-walk recovery story holds.

(b) Lockfile callsite enumeration: ~160 LOC for LockfileRepository + ~16 callsites touched (16 reader files + 2-3 write callsites). Per the pre-committed mechanical threshold (>7 callsites → SPLIT), this lands as a refactor-prequel PR. Filed as swamp-club#233.

(c) CLI parsing entanglement: PASS. pull.ts and rm.ts have a clean orchestration/CLI split today; cli/commands/extension_*.ts owns parsing. Wholesale move feasible without teasing-apart work.

(d) Cross-platform lockfile concurrency: No W2 worsening. acquireLock advisory file lock with retry-backoff stays as-is; lifecycle services do not hold a SQLite transaction during lockfile write. Existing race window is W3 territory.

Status

W2 lifecycle is in implementing phase, paused after step 1. Step 2 onwards blocked on #233 landing + ~2-day diversity-matrix soak. ADV-13 test-seam audit stays with W2 step 10 (this prequel doesn't introduce the SQLite-commit-vs-lockfile-write boundary).

Resuming W2 step 2 once #233 ships and soaks clean.

stack72 commented 5/4/2026, 11:41:35 PM

Prequel shipped — W2 step 2 unblocked

LockfileRepository (swamp-club#233 / PR #1298) merged and shipped. Follow-up tightening (UserError contract, removeEntry mkdir symmetry, deep-copy getAllEntries) shipping in PR #1299.

W2's plan v4 step 2 (LockfileRepository extraction) is now satisfied externally — implementation can proceed directly to step 3 (InstallExtensionService) using the LockfileRepository surface that just landed on main.

Note: the v4 plan's Phase A audit produced go/no-go=SPLIT against pre-committed thresholds; the prequel exercised that path. Resuming W2 from implementing phase.

Sign in to post a ripple.