Tags: systeminit/swamp
Tags
fix(models): review follow-ups for direct type execution (swamp-club#… …302) (#1353) ## Summary Addresses third round of review feedback on the direct type execution feature (#1352). - Reset `GIT_SHA` to `""` (was hardcoded from compile during verification) - Fix key-order-dependent global args comparison — use entry-by-entry instead of `JSON.stringify` to avoid false "global args differ" warnings when `--input` order changes - Move direct execution guard before variable assignment — eliminate dead `?? methodName` fallback - Use `"event"` key for `auto_created` JSON output instead of `"warning"` — creation is success, not a warning - Add 4 unit tests for `YamlDefinitionRepository` secondary search path: `findByNameGlobal` fallback, primary-over-secondary priority, `findAllGlobal` exclusion, `findById` fallback ## Test plan - [ ] 19 repository tests pass (15 existing + 4 new) - [ ] All existing unit tests unaffected - [ ] No functionality changes — correctness fixes and test coverage only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(models): direct type execution — collapse model create + method … …run (swamp-club#302) (#1352) ## Summary Adds direct type execution for both CLI and workflows, enabling users to skip the explicit `model create` step when all values come from `--input` at runtime. - **CLI**: `swamp model @type method run <method> <name> --input k=v` - **Workflow**: `modelType` + `modelName` as a mutually exclusive alternative to `modelIdOrName` - Auto-created definitions stored in `.swamp/auto-definitions/` (not `models/`) - Excluded from `model search`, findable by `model get`, synced via datastores - Input routing splits `--input` values between globalArguments and method arguments using the type's Zod schemas Closes swamp-club#302 ## Architecture ``` CLI: swamp model @type method run <method> <name> --input k=v → arg rewriter (moves @type after run for Cliffy) → model method run with 3 positionals → modelMethodRun generator → direct_execution.ts → auto-create + run Workflow YAML: task: type: model_method modelType: "@test/greeter" # mutually exclusive with modelIdOrName modelName: my-greeter methodName: greet inputs: { greeting: Hello, name: World } → execution_service → direct_execution.ts → auto-create + run ``` ### Key Design Decisions 1. **Two definition paths, clean separation**: `model create` → `models/` (git-tracked, deliberate). Direct type execution → `.swamp/auto-definitions/` (runtime state, not in git). 2. **Mutually exclusive workflow schema**: `modelIdOrName` OR `modelType` + `modelName`, never both. Schema validation enforces with clear error messages. 3. **Input routing by Zod schemas**: Method args take precedence on ambiguous keys. Unknown keys rejected with error listing valid inputs. 4. **Datastore sync**: `.swamp/auto-definitions/` is in `DEFAULT_DATASTORE_SUBDIRS` so team members share auto-created definitions. ## Files Changed (23) ### New Files (6) - `src/cli/arg_rewriter.ts` — rewrites `model @type method run` → `model method run @type` - `src/cli/arg_rewriter_test.ts` — 10 tests - `src/libswamp/models/direct_execution.ts` — shared `routeInputsBySchema` + `resolveOrCreateDefinition` - `src/libswamp/models/direct_execution_test.ts` — 9 tests - `.claude/skills/swamp-model/references/direct-execution.md` - `.claude/skills/swamp-workflow/references/direct-execution.md` ### Modified Files (17) - `src/cli/mod.ts` — wire arg rewriter into `runCli()` - `src/cli/commands/model_method_run.ts` — 2/3 positional args, wire direct execution deps - `src/libswamp/models/run.ts` — direct execution path in `modelMethodRun` generator, new events - `src/presentation/renderers/model_method_run.ts` — handle `auto_creating` + `definition_created` events - `src/infrastructure/persistence/paths.ts` — add `autoDefinitions` to `SWAMP_SUBDIRS` - `src/infrastructure/persistence/yaml_definition_repository.ts` — automatic secondary search path - `src/domain/workflows/step_task.ts` — `modelType` + `modelName` fields, mutual exclusivity - `src/domain/workflows/execution_service.ts` — direct type execution in workflow steps - `src/domain/workflows/validation_service.ts` — handle optional `modelIdOrName` - `src/domain/workflows/model_reference_extractor.ts` — handle `modelName` references - `src/domain/datastore/datastore_config.ts` — add `auto-definitions` to `DEFAULT_DATASTORE_SUBDIRS` - `src/domain/summary/summary_service.ts` — handle optional `modelIdOrName` - `design/models.md`, `design/inputs.md`, `design/workflow.md` — document new features - `.claude/skills/swamp-model/SKILL.md`, `.claude/skills/swamp-workflow/SKILL.md` — document new syntax ## End-to-End Verification All verification performed with the compiled binary against scratch repos. ### CLI Direct Type Execution | Test | Result | |---|---| | `swamp model @command/shell method run execute my-shell --input run="echo hello"` | ✅ Auto-created in `.swamp/auto-definitions/`, method executed | | Second run of same command | ✅ Reused existing definition (no duplicate), data version incremented | | `swamp model search` after auto-create | ✅ Auto-created definition NOT listed | | `swamp model get direct-echo` | ✅ Found in `.swamp/auto-definitions/` | | Log messages on first vs subsequent run | ✅ "Auto-creating" + "Definition created" on first; "Found model" only on second | ### Input Routing (custom `test/greeter` extension with both globalArgs and methodArgs) | Test | Result | |---|---| | `--input greeting=Hello --input language=en --input name=World --input loud:json=false` | ✅ greeting/language → globalArgs, name/loud → methodArgs | | Definition stores routed globalArgs | ✅ `model get` shows `globalArguments: { greeting: "Hello", language: "en" }` | | Data output combines both correctly | ✅ `"message": "Hello, World! [en]"` | | Unknown input key (`--input bogus=value`) | ✅ Rejected: "Unknown input(s): bogus. Valid inputs are: name, loud, greeting, language" | ### Workflow with modelIdOrName (existing behavior) | Test | Result | |---|---| | Workflow with `modelIdOrName: traditional-echo` | ✅ Succeeded | | `--last-evaluated` flag | ✅ Succeeded | ### Workflow with modelType + modelName (direct type execution) | Test | Result | |---|---| | Workflow with `modelType: "@command/shell"` + `modelName: wf-direct-echo` | ✅ Auto-created and executed | | Second workflow run (no duplicate definitions) | ✅ Same definition reused, data version incremented | | Workflow with `modelType: "@test/greeter"` + input routing | ✅ `Hola, Mundo! [es]` — routing correct | | CLI `--input` → workflow `${{ inputs.* }}` → modelType step | ✅ `Bonjour, Claude! [fr]` — full chain works | | `--last-evaluated` flag with modelType workflow | ✅ Succeeded | ### Workflow Validation | Test | Result | |---|---| | `workflow validate` with `modelIdOrName` | ✅ 8/8 passed | | `workflow validate` with `modelType` + `modelName` | ✅ 8/8 passed | | Both `modelIdOrName` AND `modelType` (invalid) | ✅ Rejected: "Use either modelIdOrName or modelType + modelName, not both" | | `modelType` without `modelName` (invalid) | ✅ Rejected: "modelType requires modelName" | | Neither field present (invalid) | ✅ Rejected: "requires either modelIdOrName or modelType + modelName" | ### Automated Test Suites | Suite | Result | |---|---| | `deno check` (full type check) | ✅ Zero errors | | `deno lint` | ✅ Zero errors | | `deno fmt` | ✅ Clean | | `deno run test` (5717 tests) | ✅ All pass (1 pre-existing integration test failure unrelated) | | `deno run compile` | ✅ Binary built | | UAT CLI (414 tests) | ✅ All pass | | UAT Adversarial (109 tests) | ✅ 104 pass, 5 pre-existing failures (workflow_concurrency_test.ts) | | tessl skill review (swamp-model) | ✅ 93% | | tessl skill review (swamp-workflow) | ✅ 93% | ## Follow-up Issues Filed - **swamp-club #306** — Docs: Update model-definitions.md and workflows.md for direct type execution - **swamp-uat #202** — UAT: Direct type execution for CLI and workflow steps 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(extensions): address #264 review follow-ups (#1351) ## Summary - Add `binaries` to the belt-and-suspenders path traversal check — every other manifest field had both Zod and fallback validation, `binaries` was missed - Add cross-field dedup between `additionalFiles` and `binaries` — reject manifests listing the same path in both since both land in `files/` - Narrow pull-side chmod catch to only swallow `NotFound` and log other errors at debug level Addresses review feedback from #1350. ## Test plan - [x] `deno check` — passes - [x] `deno lint` — passes - [x] `deno fmt` — passes - [x] `deno run test` — 5731 pass, 0 failures 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(extensions): add binaries manifest field for executable host hel… …pers (swamp-club#264) (#1350) ## Summary - Add a `binaries` field to the extension manifest for declaring executable host helpers (shell scripts, extensionless binaries, python wrappers) - Declared binaries are exempt from the file-extension safety allowlist but still subject to all other safety checks (hidden files, symlinks, size limits, file count) - Preserve executable mode bits through the publish/pull cycle — publisher-side `chmod` after `copyFile` in staging, pull-side defensive `chmod +x` for declared binaries - Surface a pull-time warning listing binary file paths and advising inspection before use - Send the binaries list as push metadata to swamp-club for display on extension pages (requires swamp-club/swamp-club#519) ## Cross-platform note Linux and macOS preserve the executable bit through the publish/pull cycle via tar mode bits and `Deno.chmod`. Windows: declared binary files are extracted and writable; executable invocation depends on file extension and Windows shell behavior. Not a merge gate per W-series precedent. ## Depends on - swamp-club PR [swamp-club#519](systeminit/swamp-club#519) — accept and display `binaries` field on the registry side ## Test plan - [x] `deno check` — type checking passes - [x] `deno lint` — linting passes - [x] `deno fmt` — formatting passes - [x] `deno run test` — 5705 tests pass, 0 failures - [x] `deno run compile` — binary compiles - [x] New manifest parsing tests: `binaries` field parsed correctly, defaults to `[]` - [x] New safety analyzer tests: exempt files bypass extension check, extensionless files pass when exempt, `.sh` files pass when exempt, exempt files still rejected for hidden/symlink/size violations, non-exempt files still fail - [x] Backwards-compat: extensions without `binaries` field work identically 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(telemetry): per-method telemetry events for workflow runs (swamp… …-club#301) (#1349) ## Summary Closes [swamp-club#301](https://swamp-club.com/lab/issues/301). Workflow runs now emit one `TelemetryEntry` per workflow YAML step that resolves to a model method, alongside the parent CLI invocation entry. Children use the existing `cli_invocation` event shape (same redactions as a direct `swamp model method run`) and link to the parent via a new optional `parentInvocationId` field. A new optional `workflowContext` block carries `workflowName` / `runId` / `jobName` / `stepName` / `modelType` / `driver` so per-driver and per-model-type analytics are first-class without joining through the parent. The design choice was deliberate: the issue originally proposed a new `workflow_method_invocation` event type. We pushed back during planning and chose additive optional fields on `cli_invocation` instead — the swamp-club ingest side declares `properties: Record<string, unknown>` so additive fields ride across with no consumer-side coordination. Analytics queries that aggregate by `command`/`subcommand`/`duration` immediately see workflow-internal method invocations alongside direct ones. ### What's new on the wire ```jsonc { "event": "cli_invocation", "properties": { "id": "<child-uuid>", "invocation": { "command": "model", "subcommand": "method", "args": ["run", "<REDACTED>", "<methodName>"], "optionKeys": [], "globalOptions": [] }, "result": { "status": "success", "exitCode": 0 }, "parentInvocationId": "<parent-cli-invocation-uuid>", "workflowContext": { "workflowName": "deploy", "runId": "<workflow-run-uuid>", "jobName": "build", "stepName": "validate", "modelType": "command/shell", "driver": "local" } // ... existing fields (startedAt, completedAt, durationMs, swampVersion, // denoVersion, platform, invocationContext) unchanged } } ``` Older entries continue to decode without `parentInvocationId` / `workflowContext` (forward-compat regression test added). ### Architecture - **Bridge** (`src/libswamp/workflows/telemetry_bridge.ts`) — tracks in-flight method invocations by `${jobId}:${stepId}`, maps the existing `method_executing` → `step_completed`/`step_failed` event pairs into success/error child entries, synthesizes `durationMs = 0` entries for pre-method-executing failures (model lookup, vault expression resolution, vary-key validation, env-var validation), and finalizes any unfinished invocations on stream termination so cancellation/timeout paths don't silently drop telemetry. - **Sink** (`WorkflowTelemetrySink` in `src/libswamp/workflows/run.ts`) — narrow callback shape on `WorkflowRunDeps`. CLI binds it to `TelemetryService.recordChildInvocation`; non-CLI consumers pass `undefined` and the bridge becomes a no-op. Keeps libswamp free of direct domain.telemetry imports beyond plain DTOs. - **Pre-allocated parent id** — `TelemetryService` exposes a stable `invocationId` (constructor pre-allocates a `TelemetryId`) so children can reference it as `parentInvocationId` before the parent entry itself is recorded at the end of the CLI lifecycle. Module-scoped accessor (`getActiveTelemetryService` in `src/cli/telemetry_integration.ts`) is set in `runCli` before parse and cleared in the surrounding `try/finally`. ### Domain event extensions - `step_failed` gains optional `modelName` / `methodName` / `driver`, populated only at the model-method failure site (line ~1820 in `runStep`'s catch block). Structural failures — max-nesting-depth, cycle detection, nested-workflow throw/failed — leave them undefined so the bridge can distinguish method failures from structural failures. - `method_executing` gains optional `driver`, captured from the resolved `DriverPlan`. The yield is reordered to fire after DriverPlan resolution; vary-key validation failures (which happen between event start and method_executing) become pre-method-executing failures by design — more accurate categorization since the method was never invoked. ### Failure semantics | Step outcome | Child entry | |---|---| | Success | `status: success`, real duration | | Failure after `method_executing` | `status: error`, real duration | | Failure before `method_executing` (model lookup, vault, vary, env var) | `status: error`, `durationMs = 0` (synthesized) | | `allowFailure: true` step | `status: error` on the child (method outcome); parent records workflow `success` | | Workflow-task / nested workflow / cycle / depth | No child entry (no method was ever invoked at this step) | | Cancellation / timeout / mid-stream throw | In-flight invocations drained as `error` via the bridge's `finalize()` | ### V1 limitations (documented in `design/workflow.md`) - Workflow-step granularity only. Sub-method follow-up calls inside `DefaultMethodExecutionService.execute` are not captured separately. - Failures before workflow validation (workflow not found, input schema validation) produce no child entry — no method was ever resolved. ## Test Plan - [x] **Unit tests** — `TelemetryEntry` round-trip with/without new fields (back-compat regression locked in); `TelemetryService.recordChildInvocation` success and error paths with `UserError` classification; `WorkflowTelemetryBridge` for all five branches (success, post-method failure, pre-method failure, structural skip, finalize drain) plus idempotency, sequential workflows, forEach, allowFailure semantics — 23 new test cases. - [x] **libswamp error-terminal test** — mid-stream `throw` with an in-flight method invocation: bridge's `try/finally` drains it as an error child, parent stream's `error` event still propagates cleanly. - [x] **Integration test** (`integration/telemetry_workflow_method_invocations_test.ts`) — end-to-end CLI invocation runs a workflow with success step + forEach iterations, asserts one parent + correct number of children with `parentInvocationId` linkage and full `workflowContext` (including `driver`, `modelType`). - [x] **Wire-shape tests** — `HttpTelemetrySender` includes new fields at `properties.parentInvocationId` / `properties.workflowContext.*`; omitted entirely when absent (no `undefined` serialization). - [x] **Repository round-trip** — `JsonTelemetryRepository` saves and reads new fields; legacy entries without them decode cleanly. - [x] **Verification gates** — `deno check`, `deno lint`, `deno fmt --check`, `deno run test` (5723 passed, 0 failed), `deno run compile`. - [x] **Manual end-to-end** — ran a throwaway workflow in `~/git/swamp-media` and inspected `~/git/swamp-media/.swamp/telemetry/`. Got one parent + three children (ok-step, fanout-a, fanout-b) with all `workflowContext` fields populated and consistent `parentInvocationId` / `runId`. Children share the redacted-args shape with direct `model method run` invocations. forEach iterations have distinct `stepName`s. ## Consumer side Verified against swamp-club: `services/telemetry/lib/schema.ts` declares `properties: Record<string, unknown>` so the additive fields ride across the wire with zero coordination. Existing rollup metrics in `consumers/metrics.ts` already follow the "read what you need from the opaque bag" pattern. A follow-up workflowContext rollup metric (per-driver / per-model-type / per-step counts) is a separate swamp-club issue, not blocking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(datastore): break parent-child lock deadlock in workflow shell st… …eps (swamp-club#296) (#1347) ## Summary - Fixes swamp-club#296: `swamp extension push` (and any structural command) deadlocked when invoked as a subprocess from inside a `swamp workflow run` step - `acquireModelLocks` now sets `SWAMP_LOCK_HOLDER_PID=<pid>` in the process environment; `waitForPerModelLocks` skips locks whose PID matches — child processes inherit the env var and avoid polling on their parent's locks - Propagates the env var through explicit shell env in `command/shell` when user-defined env vars or vault secrets replace the inherited environment - Documents parent-process lock awareness in `design/datastores.md` ## Test plan - [x] 3 unit tests: parent-PID skip, non-matching PID waits, no-env-var preserves behavior - [x] Integration test: workflow shell step runs nested swamp command without deadlock - [x] Manual verification: reproduction scenario completes in ~300ms (previously hung indefinitely) - [x] Full test suite: 5697 passed, 0 failed - [x] `deno check` / `deno lint` / `deno fmt --check` — clean - [x] `deno run compile` — binary compiled and used for verification - [x] UAT issue filed: systeminit/swamp-uat#201 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(extensions): carry filesystem mtime through Source entity (swamp… …-club#271) (#1348) ## Summary - Add `sourceMtime: string` to the `Source` domain entity and thread it through the entire lifecycle — from `Deno.stat` in reconcile/install services, through domain transitions (`observeFreshSource`, `withState`, `withFingerprintAndState`), to catalog persistence via `sourceToRow()`. - `withFingerprintAndState` takes `sourceMtime` as a **required** parameter (not optional) so the compiler catches every call site that forgets to supply mtime — preventing the class of bugs this issue was filed to fix. - `withState` preserves existing mtime (correct asymmetry: state-only transitions like tombstoning don't re-observe the source). - `Deno.stat` in reconcile happens **before** the try block so failed builds still record the real filesystem mtime (the file is on disk and readable; only bundling failed). Closes swamp-club#271 ## Test Plan - `deno check` — 0 type errors (compiler catches all missing `sourceMtime` call sites) - `deno lint` — clean - `deno fmt --check` — clean - `deno run test` — 5694 passed, 0 failed - `deno run compile` — binary compiles successfully - Unit tests verify `sourceMtime` round-trip through `makeSource`, `withState` (preserves), `withFingerprintAndState` (updates), and `observeFreshSource` (both new and existing source paths) - `ExtensionRepository` tests verify mtime survives catalog write → read cycle 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(rubric): collapse platform factors into single 2-point factor (#1346 ) ## Summary - Mirrors swamp-club#509 — collapse `platforms-one` and `platforms-two` into a single `platforms` factor worth 2 points - A single-platform manifest now earns full credit — platform breadth is not a quality proxy when the upstream constrains it - Factor count drops from 10 to 9; max stays at 12 - CLI and registry now produce identical scoring ## Test plan - [x] All 66 unit tests pass (`deno test src/domain/extensions/extension_rubric_scorer_test.ts`) - [x] Parity tests updated to match swamp-club server scorer - [x] Integration test updated (single-platform manifest no longer fails `platforms-two`) - [ ] Run `integration/extension_quality_test.ts` in a full dev environment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(extensions): scope extractTypeFromSource regex and fix validateSe… …condaryExport binding (#1345) ## Summary Two W4 follow-up fixes: - **Scope `extractTypeFromSource` regex** in vault/driver/datastore/report adapters to match within the `export const <kind> = {` block, not the first `type:` (or `name:`) occurrence in the file. Previously, a helper object like `const cfg = { type: "wrong/type" }` above the export would pollute the catalog entry, causing unnecessary rebundles on every cold start. - **Fix `validateSecondaryExport` this-binding loss** in `ExtensionLoader` — called inline instead of extracting to a local variable, preserving the `this` binding for future adapter implementations. ## Test plan - [x] `deno check` passes - [x] `deno lint` passes - [x] `deno fmt --check` passes - [x] 5,697 tests pass, 0 failures - [x] `deno run compile` succeeds - [x] Added 4 regression tests (one per non-model adapter) verifying `extractTypeFromSource` ignores `type:`/`name:` in helper objects above the export 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(workflows): resolve self.* in modelIdOrName during runtime forEac… …h execution (swamp-club#294) (#1344) ## Summary - Resolve `self.*` expressions in `task.modelIdOrName` and `task.methodName` in `DefaultStepExecutor.executeModelMethod()` before the model lookup, closing the gap between the evaluate display path (which already resolved these) and the runtime execution path - Remove the `!options.lastEvaluated` guard from `ForEachExpansionService.expand()` in `runJob()` so forEach expansion runs in all modes — it's a structural transformation, not expression evaluation - Add unit test verifying the executor resolves `self.*` in modelIdOrName via the expression context before model lookup ## Test plan - [x] Unit test: `DefaultStepExecutor resolves self.* expressions in modelIdOrName before model lookup` - [x] Full test suite: 5689 passed, 0 failed - [x] E2E: `swamp workflow run` with forEach + `self.*` in modelIdOrName succeeds - [x] E2E: `swamp workflow run --last-evaluated --input` with forEach works - [x] E2E: `swamp workflow evaluate` + `--last-evaluated` with inputs works - [x] Verified `--last-evaluated` without inputs correctly errors ("No such key") — inputs are required for forEach.in evaluation Closes swamp-club#294 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Sean Escriva <webframp@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PreviousNext