feat(config): support structured custom review agents #278
Labels
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Slipstream/auto_review#278
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Repo-level
guidelinescurrently provide one shared block of free-form review guidance. That is useful for broad conventions, but it does not let a repo define multiple focused review perspectives with distinct checklists.Repos should be able to ask Auto Review to apply custom, named review agents such as a security-focused checklist, API-design checklist, migration checklist, or domain-specific reviewer, without turning the config into an unconstrained prompt-injection surface.
Proposed enhancement
Add repo-config support for custom named review agents with structured checklist rules.
The config should avoid completely free-form prompt text. Instead, each custom agent should have a stable name and a list of specific checks to perform.
Example shape:
Exact key names/schema are TBD, but the important constraints are:
Design considerations
guidelineswhile making this the preferred shape for reusable checklists.Acceptance criteria
.auto_review.yamlcan define named custom review agents with checklist items.--strict..auto_review.example.yamldescribe the feature and show examples.Design-plan update: evaluate inline vs focused custom review agents
Initial implementation plan was to add
review_agentsas structured repo config and render all named agents/checklists into one structured section of the default review prompt. That plan prioritized minimal architecture change, lower cost, lower latency, and reuse of the existing single review/verifier path.John pushed back that this assumes the model will attend well to several review perspectives in one pass. Production review runs on GPT-4o / GPT-4o-mini tiers, not a stronger high-reasoning model, so combining security, architecture, migration, API-design, and domain-specific checklists into one prompt may dilute the intended focus. The core reason to name custom agents may be to run focused review passes, not just organize prompt text.
Revised plan: do not choose the production strategy by intuition. Add an evaluation path first and compare:
inline: one normal review pass with a structuredCustom review agentsprompt section.separate: one focused LLM pass per named review agent, then merge/dedupe findings before the existing verifier.Evaluation should extend the existing
auto-review bench runfixture harness rather than creating a separate tool.Metrics to capture:
Corpus requirements:
agent_id/check_idPredeclared decision rule:
Choose separate focused passes only if, on GPT-4o-mini:
Choose inline if:
Implementation order:
.auto_review.yaml review_agentsproduction behavior using the winning strategy.We will stop after the eval report and consult before implementing production review behavior beyond what the evaluation harness needs.