fix(opencode): enforce disciplined RGR guardrails #212

Merged
jwilger merged 19 commits from bdd-tdd-guardrails into main 2026-05-16 12:28:01 -07:00
Owner

Summary

Tightens opencode's BDD/TDD guardrails so behavior work proceeds through one focused RED, reviewed RED approval, one implementation edit, focused rerun, GREEN review, and committed checkpoints. Adds mechanical RGR plugin gates for multi-failure RED rejection, clean worktree starts, reviewed RED state, single-edit tokens, apply_patch path extraction, and broad implementer-prompt blocking, with release-tooling contract coverage.

Type of change

  • feat — new capability
  • fix — bug fix
  • docs — documentation only
  • refactor — internal cleanup, no behaviour change
  • test — test-only addition
  • chore — tooling, deps, CI

Verification

  • cargo test --workspace --all-targets
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --all -- --check
  • cargo deny check (when bumping dependencies)
  • Manual smoke test against a dev gateway (when changing the review pipeline or webhook surface)
  • bash tests/release_tooling_test.sh
  • nix flake check

Pre-merge checklist

  • Commit titles follow conventional commits; the release PR generates changelog notes from conventional commits
  • Public surface changes have rustdoc on the new items
  • If the change touches a documented threat (T#) in docs/THREAT-MODEL.md, the corresponding red-team test in crates/ar-review/tests/red_team_*.rs has been updated or extended
  • If the change touches a metric, the rules pack (deploy/prometheus/auto_review.rules.yaml) and dashboard (deploy/grafana/auto_review.dashboard.json) still pass their contract tests

No issue linked.

## Summary Tightens opencode's BDD/TDD guardrails so behavior work proceeds through one focused RED, reviewed RED approval, one implementation edit, focused rerun, GREEN review, and committed checkpoints. Adds mechanical RGR plugin gates for multi-failure RED rejection, clean worktree starts, reviewed RED state, single-edit tokens, apply_patch path extraction, and broad implementer-prompt blocking, with release-tooling contract coverage. ## Type of change - [ ] feat — new capability - [x] fix — bug fix - [x] docs — documentation only - [ ] refactor — internal cleanup, no behaviour change - [x] test — test-only addition - [x] chore — tooling, deps, CI ## Verification - [ ] cargo test --workspace --all-targets - [ ] cargo clippy --workspace --all-targets -- -D warnings - [ ] cargo fmt --all -- --check - [ ] cargo deny check (when bumping dependencies) - [ ] Manual smoke test against a dev gateway (when changing the review pipeline or webhook surface) - [x] bash tests/release_tooling_test.sh - [x] nix flake check ## Pre-merge checklist - [x] Commit titles follow conventional commits; the release PR generates changelog notes from conventional commits - [ ] Public surface changes have rustdoc on the new items - [ ] If the change touches a documented threat (T#) in docs/THREAT-MODEL.md, the corresponding red-team test in crates/ar-review/tests/red_team_*.rs has been updated or extended - [ ] If the change touches a metric, the rules pack (deploy/prometheus/auto_review.rules.yaml) and dashboard (deploy/grafana/auto_review.dashboard.json) still pass their contract tests ## Related No issue linked.
fix(nix): include RGR reviewer agent in checks
All checks were successful
CI / Verify PR with nix flake check (pull_request) Successful in 3m38s
CI / Build PR artifacts (no token) (pull_request) Successful in 2s
CI / Publish PR artifact packages (pull_request) Successful in 2s
CI / Request auto_review semantic review (pull_request) Successful in 1s
auto_review auto_review: no findings
ef6c4574b3
auto-review left a comment

This PR introduces stricter guardrails for the RGR cycle, enforcing single-failure RED tests and requiring RED approval before production edits. The changes appear well-structured and should enhance discipline in the development process.

Walkthrough

  • .opencode/agents/build.md: Updated to include rgr_approve_red in the workflow, ensuring RED approval before production edits.
  • .opencode/agents/rgr-diagnostic-implementer.md: Added requirement to name the diagnostic and allowed immediate change.
  • .opencode/agents/rgr-implementation-reviewer.md: Introduced routing behavior gaps back to the orchestrator for new RED tests.
  • .opencode/agents/rgr-test-author.md: Enforced single-failure RED tests.
  • .opencode/agents/rgr-test-reviewer.md: Added veto condition for multiple failing tests or diagnostics.
  • .opencode/commands/bugfix-rgr.md: Required RED evidence recording and approval before production edits.
  • .opencode/commands/outside-in-rgr.md: Prohibited acceptance of multi-failure RED output.
  • .opencode/commands/tdd-implement.md: Required RED and RED approval recording before editing production behavior.
  • .opencode/plugins/auto-review-discipline.ts: Added assertCleanWorktree and validateRgrRedEvidence functions.
  • .opencode/plugins/lib/shared.ts: Introduced validateRgrRedEvidence function.
  • .opencode/rules/tdd-discipline.md: Enforced single-failure RED tests.
  • .opencode/skills/outside-in-rgr-microcycle/SKILL.md: Required RED to expose exactly one current failing test or diagnostic.
  • .opencode/skills/outside-in-tdd/SKILL.md: Enforced single-failure RED tests.
  • AGENTS.md: Declared multi-failure RED output as invalid.
This PR introduces stricter guardrails for the RGR cycle, enforcing single-failure RED tests and requiring RED approval before production edits. The changes appear well-structured and should enhance discipline in the development process. ## Walkthrough - **.opencode/agents/build.md**: Updated to include `rgr_approve_red` in the workflow, ensuring RED approval before production edits. - **.opencode/agents/rgr-diagnostic-implementer.md**: Added requirement to name the diagnostic and allowed immediate change. - **.opencode/agents/rgr-implementation-reviewer.md**: Introduced routing behavior gaps back to the orchestrator for new RED tests. - **.opencode/agents/rgr-test-author.md**: Enforced single-failure RED tests. - **.opencode/agents/rgr-test-reviewer.md**: Added veto condition for multiple failing tests or diagnostics. - **.opencode/commands/bugfix-rgr.md**: Required RED evidence recording and approval before production edits. - **.opencode/commands/outside-in-rgr.md**: Prohibited acceptance of multi-failure RED output. - **.opencode/commands/tdd-implement.md**: Required RED and RED approval recording before editing production behavior. - **.opencode/plugins/auto-review-discipline.ts**: Added `assertCleanWorktree` and `validateRgrRedEvidence` functions. - **.opencode/plugins/lib/shared.ts**: Introduced `validateRgrRedEvidence` function. - **.opencode/rules/tdd-discipline.md**: Enforced single-failure RED tests. - **.opencode/skills/outside-in-rgr-microcycle/SKILL.md**: Required RED to expose exactly one current failing test or diagnostic. - **.opencode/skills/outside-in-tdd/SKILL.md**: Enforced single-failure RED tests. - **AGENTS.md**: Declared multi-failure RED output as invalid.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Slipstream/auto_review!212
No description provided.