fix(opencode): enforce disciplined RGR guardrails #212

Merged

jwilger merged 19 commits from bdd-tdd-guardrails into main

2026-05-16 12:28:01 -07:00

jwilger commented

2026-05-16 11:57:36 -07:00

Owner

Summary

Tightens opencode's BDD/TDD guardrails so behavior work proceeds through one focused RED, reviewed RED approval, one implementation edit, focused rerun, GREEN review, and committed checkpoints. Adds mechanical RGR plugin gates for multi-failure RED rejection, clean worktree starts, reviewed RED state, single-edit tokens, apply_patch path extraction, and broad implementer-prompt blocking, with release-tooling contract coverage.

Type of change

feat — new capability
fix — bug fix
docs — documentation only
refactor — internal cleanup, no behaviour change
test — test-only addition
chore — tooling, deps, CI

Verification

cargo test --workspace --all-targets
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --all -- --check
cargo deny check (when bumping dependencies)
Manual smoke test against a dev gateway (when changing the review pipeline or webhook surface)
bash tests/release_tooling_test.sh
nix flake check

Pre-merge checklist

Commit titles follow conventional commits; the release PR generates changelog notes from conventional commits
Public surface changes have rustdoc on the new items
If the change touches a documented threat (T#) in docs/THREAT-MODEL.md, the corresponding red-team test in crates/ar-review/tests/red_team_*.rs has been updated or extended
If the change touches a metric, the rules pack (deploy/prometheus/auto_review.rules.yaml) and dashboard (deploy/grafana/auto_review.dashboard.json) still pass their contract tests

No issue linked.

## Summary Tightens opencode's BDD/TDD guardrails so behavior work proceeds through one focused RED, reviewed RED approval, one implementation edit, focused rerun, GREEN review, and committed checkpoints. Adds mechanical RGR plugin gates for multi-failure RED rejection, clean worktree starts, reviewed RED state, single-edit tokens, apply_patch path extraction, and broad implementer-prompt blocking, with release-tooling contract coverage. ## Type of change - [ ] feat — new capability - [x] fix — bug fix - [x] docs — documentation only - [ ] refactor — internal cleanup, no behaviour change - [x] test — test-only addition - [x] chore — tooling, deps, CI ## Verification - [ ] cargo test --workspace --all-targets - [ ] cargo clippy --workspace --all-targets -- -D warnings - [ ] cargo fmt --all -- --check - [ ] cargo deny check (when bumping dependencies) - [ ] Manual smoke test against a dev gateway (when changing the review pipeline or webhook surface) - [x] bash tests/release_tooling_test.sh - [x] nix flake check ## Pre-merge checklist - [x] Commit titles follow conventional commits; the release PR generates changelog notes from conventional commits - [ ] Public surface changes have rustdoc on the new items - [ ] If the change touches a documented threat (T#) in docs/THREAT-MODEL.md, the corresponding red-team test in crates/ar-review/tests/red_team_*.rs has been updated or extended - [ ] If the change touches a metric, the rules pack (deploy/prometheus/auto_review.rules.yaml) and dashboard (deploy/grafana/auto_review.dashboard.json) still pass their contract tests ## Related No issue linked.

jwilger added 19 commits

2026-05-16 11:57:36 -07:00

test(opencode): reject multi-failure RED evidence 2306b06e6a

fix(opencode): validate RED evidence before recording b590f3c4bb

docs(opencode): route GREEN behavior gaps to RED 9e450fec61

fix(opencode): surface RGR commit checkpoints d3c5d0849d

fix(opencode): require clean worktree for RGR start 0caa7d306e

fix(opencode): guide dirty RGR start recovery 7f643f72f9

fix(opencode): require reviewed RED before edits 963c1cb143

fix(opencode): require RED review in record message 1df6714991

fix(opencode): reset RED approval on new evidence eadf5bf07c

fix(opencode): require rerun after each RGR edit 9d74576a13

fix(opencode): extract apply_patch changed paths 583f009c1f

fix(opencode): gate every apply_patch changed path 9491274d53

fix(opencode): block broad RGR implementation prompts 5bb8f600b9

test(opencode): run RGR guardrail suites 9ef678dcd2

docs(opencode): codify disciplined RGR handoffs ef28ef0291

test(opencode): align RED review gate contract e74a1f0a23

test(opencode): align single-edit gate contract 6de3ed198b

fix(nix): include discipline plugin in checks df8ccefc37

fix(nix): include RGR reviewer agent in checks

CI / Verify PR with nix flake check (pull_request) Successful in 3m38s

Details

CI / Build PR artifacts (no token) (pull_request) Successful in 2s

Details

CI / Publish PR artifact packages (pull_request) Successful in 2s

Details

CI / Request auto_review semantic review (pull_request) Successful in 1s

Details

auto_review auto_review: no findings

ef6c4574b3

auto-review approved these changes

2026-05-16 12:27:20 -07:00

auto-review left a comment

Walkthrough

.opencode/agents/build.md: Updated to include rgr_approve_red in the workflow, ensuring RED approval before production edits.
.opencode/agents/rgr-diagnostic-implementer.md: Added requirement to name the diagnostic and allowed immediate change.
.opencode/agents/rgr-implementation-reviewer.md: Introduced routing behavior gaps back to the orchestrator for new RED tests.
.opencode/agents/rgr-test-author.md: Enforced single-failure RED tests.
.opencode/agents/rgr-test-reviewer.md: Added veto condition for multiple failing tests or diagnostics.
.opencode/commands/bugfix-rgr.md: Required RED evidence recording and approval before production edits.
.opencode/commands/outside-in-rgr.md: Prohibited acceptance of multi-failure RED output.
.opencode/commands/tdd-implement.md: Required RED and RED approval recording before editing production behavior.
.opencode/plugins/auto-review-discipline.ts: Added assertCleanWorktree and validateRgrRedEvidence functions.
.opencode/plugins/lib/shared.ts: Introduced validateRgrRedEvidence function.
.opencode/rules/tdd-discipline.md: Enforced single-failure RED tests.
.opencode/skills/outside-in-rgr-microcycle/SKILL.md: Required RED to expose exactly one current failing test or diagnostic.
.opencode/skills/outside-in-tdd/SKILL.md: Enforced single-failure RED tests.
AGENTS.md: Declared multi-failure RED output as invalid.

This PR introduces stricter guardrails for the RGR cycle, enforcing single-failure RED tests and requiring RED approval before production edits. The changes appear well-structured and should enhance discipline in the development process. ## Walkthrough - **.opencode/agents/build.md**: Updated to include `rgr_approve_red` in the workflow, ensuring RED approval before production edits. - **.opencode/agents/rgr-diagnostic-implementer.md**: Added requirement to name the diagnostic and allowed immediate change. - **.opencode/agents/rgr-implementation-reviewer.md**: Introduced routing behavior gaps back to the orchestrator for new RED tests. - **.opencode/agents/rgr-test-author.md**: Enforced single-failure RED tests. - **.opencode/agents/rgr-test-reviewer.md**: Added veto condition for multiple failing tests or diagnostics. - **.opencode/commands/bugfix-rgr.md**: Required RED evidence recording and approval before production edits. - **.opencode/commands/outside-in-rgr.md**: Prohibited acceptance of multi-failure RED output. - **.opencode/commands/tdd-implement.md**: Required RED and RED approval recording before editing production behavior. - **.opencode/plugins/auto-review-discipline.ts**: Added `assertCleanWorktree` and `validateRgrRedEvidence` functions. - **.opencode/plugins/lib/shared.ts**: Introduced `validateRgrRedEvidence` function. - **.opencode/rules/tdd-discipline.md**: Enforced single-failure RED tests. - **.opencode/skills/outside-in-rgr-microcycle/SKILL.md**: Required RED to expose exactly one current failing test or diagnostic. - **.opencode/skills/outside-in-tdd/SKILL.md**: Enforced single-failure RED tests. - **AGENTS.md**: Declared multi-failure RED output as invalid.

jwilger merged commit d137107c43 into main

2026-05-16 12:28:01 -07:00

jwilger referenced this pull request from a commit

2026-05-16 12:28:02 -07:00

fix(opencode): enforce disciplined RGR guardrails (#212)