fix(review,chat): exempt release PRs and gate override approvals #290
No reviewers
Labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Slipstream/auto_review!290
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "issue-287-release-metadata-and-nl-corrections"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Scope of this PR
Closes #287. Fixes both recurring defects from the issue (reproduced on eventcore PR #396) and adds the controls discussed while implementing them.
Problem A — release PRs wrongly rejected by the metadata check
is_release_pr_titlerecognizes the conventional release shapes regardless of body —chore(release): vX.Y.Z(the scoped form that regressed),chore: release vX.Y.Z,release: X.Y.Z, with an optional leadingv— requiring a release-typed prefix and a semver token so a genericchore(deps): bump x to 1.2.3is not mistaken for a release.has_clearly_acceptable_pr_metadataso the metadata-quality check can never driveREQUEST_CHANGESon a release PR, and taught the LLM metadata prompt that the scoped form + a title-mirroring release body are acceptable.Problem B — @auto-review now responds to the intent of a comment
parse_review_correction; any quote-reply with prose (and any plain directed mention) is now classified by a cheap-tier LLM intoapproval_request | re_review | questionand branched on (previously the classifier result was discarded and every correction unconditionally approved). A natural "no, that's fine for a release PR — please approve" is understood without special syntax.Override controls (added during review)
REQUEST_CHANGES) requires the commenter to be listed in the new per-repooverride_approversallow-list. Opt-in — empty/absent list authorizes nobody; the bot declines and points to the config.[override-approved]title prefix and an## Approval overridebody section (with the reason) onto the PR, so it rides into the squash/merge commit (auto-review still never merges). A later clean automated approval strips the marker automatically. PReditedevents don't trigger reviews, so there is no re-review loop.Plumbing
update_pull_request(edit title/body) andget_file_content(read.auto_review.yamlwithout a clone);PullReviewSummarygainsuser.ar-review::override_marker(pure, idempotent, round-trip tested) to avoid an ar-chat↔ar-review cycle; the pipeline uses them for the strip.ChatContext(webhook + poller)..auto_review.example.yamland USER-GUIDE documentoverride_approversand the override flow.Why
It should not take multiple tries to get a release PR approved. Release PRs are now accepted on the first pass, and a single natural-language comment from an authorized user (with a reason) is honored — with the override made obvious and auditable in the merge commit, and automatically cleaned up if the underlying issue is actually fixed.
Verification
cargo nextest run --workspace— 777 passed.cargo clippy --workspace --all-targets -- -D warnings— clean.cargo fmt --all -- --check— clean.🤖 Generated with Claude Code
Two new client capabilities needed by the override flow: - `update_pull_request` (PATCH pulls/{n}) edits the PR title and/or body, sending only the provided fields. This lets auto-review stamp — and later strip — the human-override marker on the PR so it rides into the squash commit a maintainer creates from the PR title/body. auto-review still never merges. - `get_file_content` (GET raw/{path}?ref=) reads a file's raw contents at a ref, returning None on 404. The chat path uses it to read `.auto_review.yaml` (the override_approvers allow-list) without cloning the repo. Refs #287 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>This PR introduces significant changes to how release PRs and override approvals are handled, including new LLM-based intent classification and override authorization mechanisms. The changes appear well-tested, but the reliance on LLMs for intent classification and the new override logic should be carefully monitored for edge cases. Overall, the PR seems safe to merge with these considerations in mind.
Walkthrough
Key Changes
is_release_pr_titleto correctly identify release PRs and exempt them from metadata checks..auto_review.example.yamlto includeoverride_approversfor managing who can force approvals.Testing and Verification
LLM usage and cost
Estimated total USD: $0.134317 via https://api.openai.com and https://api.openai.com
Pre-merge checks
fix(review,chat): exempt release PRs and handle override approvals naturallyto fix(review,chat): exempt release PRs and gate override approvals@auto-review wrote in #290 (comment):
Good catch — and a fitting test of this very PR. The title was 75 characters; I've shortened it to "fix(review,chat): exempt release PRs and gate override approvals" (63 chars), which is within the limit. No code changes were needed.
@auto-review re-review
This pull request introduces several critical changes across multiple files:
Configuration File (.auto_review.example.yaml):
override_approverskey allowing specific users to override auto-review findings by commenting. For example, line 48 shows "override_approvers: - 'jwilger'"..gitignore:
/.claude/settings.local.json.Cargo Files:
Cargo.lockandCargo.tomlfiles have been modified to include the newar-reviewdependency (lines 82, 18).Chat and Handler Logic:
crates/ar-chat/src/command.rs).handle_review_correctionis modified into a more general methodhandle_directed, allowing natural language processing for feedback (lines 246-261).crates/ar-chat/src/handler.rs).Tests:
crates/ar-chat/tests/review_correction.rs).Overall, the changes enhance user interaction with the auto-review bot by allowing greater flexibility in how approvals are managed and reinforcing the review process with structured reasoning.
@auto-review re-review
Queued a forced review at
22ad41741f. This intentionally bypasses CI gating; watch the commit-status badge for progress.The changes appear well-structured and include necessary tests, but the reliance on LLMs for intent classification and override reasoning introduces potential risks that should be carefully monitored.
Walkthrough
override_approversto.auto_review.example.yamlto allow specific users to override bot findings.parse_review_correctionandhandle_directedfunctions.LLM usage and cost
Estimated total USD: $0.594825 via https://api.openai.com and https://api.openai.com
@ -113,6 +113,12 @@ fn parse_review_correction(if !after.trim_start().starts_with("wrote in ") {🟡 Warning: Lines 113–130: The change to use LLM for intent classification in
parse_review_correctionremoves the previous keyword-based checks. Ensure that the LLM model is adequately trained to handle diverse comment inputs to avoid misclassification.🟡 Warning: Lines 243–647: The
handle_directedfunction now routes comments based on intent. Ensure that the LLM model is capable of accurately classifying intents to prevent incorrect approvals or re-reviews.🟡 Warning: Lines 243–647: The
judge_override_reasonfunction uses an LLM to determine if a comment contains a substantive reason for an override. Ensure that the LLM is capable of accurately assessing the content of comments to prevent unjustified overrides.Thanks — re-review is green (approved). Addressing the three advisory warnings about LLM classification/judging reliability (comments 10127, 10128, 10129). They're valid cautions, but the design deliberately treats the LLM as advisory, never as the sole authority, so a misclassification fails safe rather than causing an incorrect approval:
classify_intent): on a missing cheap tier or any unparsable/failed response it defaults toQuestion— i.e. it answers, it does not approve or re-review. The worst case of a misclassification is "the bot replies instead of acting," which the user can correct with another comment.approval_request, an override of outstanding findings only happens when the commenter is on the repo'soverride_approversallow-list. The LLM cannot grant authority it isn't given.judge_override_reasonis a gate, not an approver. A "yes, has reason" verdict never approves on its own — it only unblocks an already-authorized user. On cheap-tier failure it falls back to a conservative word-count heuristic. The failure mode is "asks the user to restate why," not "approves without reason."is_release_pr_title) — independent of any LLM — so Problem A's fix doesn't depend on model behavior at all.So the LLM only ever decides which advisory path to take; the consequential decisions (who may override, whether findings exist) are deterministic. I don't think a code change is warranted here, but these are good things to keep an eye on once deployed. Test coverage exercises the authorized / unauthorized / no-reason / nothing-outstanding paths explicitly.
(Note: the earlier freeform reply rather than a re-review on my quoted comment is the exact brittleness this PR fixes — the currently-deployed bot is the pre-#290 build; these behaviors go live after this lands and the server is updated.)