Automatic code reviews for Forgejo that work with local models and/or BYOK cloud models.
  • Rust 98.2%
  • TypeScript 0.9%
  • Nix 0.6%
  • Gherkin 0.1%
  • Dockerfile 0.1%
Find a file
2026-05-02 13:15:01 -07:00
.forgejo fix(pre-merge): scan every marker occurrence in contains_todo_marker (#4) 2026-05-01 14:27:44 -07:00
.kilo chore(kilo): relax agent step limits (#37) 2026-05-02 12:32:12 -07:00
bench feat(bench): --baseline FILE comparison + --fail-on-regression 2026-05-01 03:00:26 -07:00
crates fix(review): show linter run summaries in reviews (#40) 2026-05-02 13:15:01 -07:00
deploy fix(embed): size embedding pass for local Ollama (#20) (#21) 2026-05-01 20:36:18 -07:00
docs fix(review): request changes for failed pre-merge checks (#38) 2026-05-02 12:41:50 -07:00
tests/features docs(e2e): runbook + Gherkin scenarios for real-Forgejo verification 2026-05-01 07:47:59 -07:00
.auto_review.example.yaml docs(config): example yaml + contract test for RepoConfig drift 2026-05-01 04:18:50 -07:00
.dockerignore chore: project-tooling polish (.dockerignore, PR template, Renovate) 2026-05-01 03:35:11 -07:00
.gitignore chore(kilo): add project workflow configuration (#32) 2026-05-02 11:40:52 -07:00
AGENTS.md chore(kilo): add project workflow configuration (#32) 2026-05-02 11:40:52 -07:00
bacon.toml fix(dev): recover bacon run after transient errors (#34) 2026-05-02 12:06:52 -07:00
Cargo.lock feat: persist runtime state across restarts (closes #19) (#25) 2026-05-02 06:47:29 -07:00
Cargo.toml chore(toolchain): switch to rust nightly via flake-pinned snapshot 2026-05-01 09:51:55 -07:00
CHANGELOG.md fix(review): show linter run summaries in reviews (#40) 2026-05-02 13:15:01 -07:00
CLA.md chore(ci): pin toolchain via flake.nix; CI runs nix flake check 2026-05-01 09:43:56 -07:00
CONTRIBUTING.md docs: refresh dev-setup, status, runbook for the nightly+flake reality 2026-05-01 10:02:44 -07:00
deny.toml chore(ci): pin toolchain via flake.nix; CI runs nix flake check 2026-05-01 09:43:56 -07:00
flake.lock chore(ci): pin toolchain via flake.nix; CI runs nix flake check 2026-05-01 09:43:56 -07:00
flake.nix fix(nix): align cargo-fmt rustfmt with dev shell (#39) 2026-05-02 12:49:18 -07:00
kilo.json chore(kilo): add project workflow configuration (#32) 2026-05-02 11:40:52 -07:00
LICENSE chore: add LICENSE (AGPL-3.0-or-later) with rationale 2026-04-30 21:22:10 -07:00
QUICKSTART.md fix(embed): size embedding pass for local Ollama (#20) (#21) 2026-05-01 20:36:18 -07:00
README.md docs: refresh dev-setup, status, runbook for the nightly+flake reality 2026-05-01 10:02:44 -07:00
renovate.json chore: project-tooling polish (.dockerignore, PR template, Renovate) 2026-05-01 03:35:11 -07:00
rust-toolchain.toml chore(toolchain): switch to rust nightly via flake-pinned snapshot 2026-05-01 09:51:55 -07:00
SECURITY.md docs: SECURITY.md vulnerability disclosure policy 2026-05-01 02:34:43 -07:00

auto_review

A self-hosted, AI-driven pull-request reviewer for Forgejo.

auto_review aims for functional parity with closed-source AI reviewers (CodeRabbit, Greptile, Cursor BugBot) while running entirely on infrastructure you control, with optional support for fully local LLMs.

Status

Alpha. End-to-end review pipeline works: webhook intake → LLM triage (skip lockfile-only PRs, route trivial files away from the reasoning model) → shallow-clone → 45 bundled linters fanned out in parallel inside an optional sandbox → tree-sitter + embedding RAG context + persistent learnings memory → reasoning-tier LLM with strict-JSON-schema output and self-heal validation → cheap- tier verifier drops unfounded findings → post inline review comments + commit status. The @auto_review chat handler accepts help, remember <text>, forget <id>, re-review, autofix, docstring, tests, and free-form questions answered by the cheap-tier model. The bench CLI subcommand replays PR fixtures through the LLM-review path for regression tracking and model comparison. CLI helpers mint the bot's PAT and register the webhook on a repo.

Build, dev, and CI all run through one flake.nix so local work and CI exercise identical derivations bit-for-bit (see CONTRIBUTING.md for the dev setup, or nix flake check for the same gates CI runs).

To deploy: see QUICKSTART.md. To run on an ongoing basis (rotation, upgrades, alerts, repo config), see docs/OPERATIONS.md. If you're a PR author whose changes are reviewed by an auto_review deployment and you want to know what the bot does and how to talk to it, see docs/USER-GUIDE.md. If you've found a security issue, see SECURITY.md for the disclosure process. For background, the feasibility study lays out the broader plan; ADR-0001 captures the architecture decision; the threat model enumerates attacker profiles, trust boundaries, and per-class mitigations (read this before exposing the bot to drive-by PRs). ADR-0002 documents why every linter spawn is sandboxed; ADR-0003 documents the metrics / readiness / runtime-introspection design; ADR-0004 explains why embeddings persist via SQLite today rather than LanceDB.

What's still on the roadmap: real-world verification on a production Forgejo instance with real PR traffic; a larger labelled-corpus benchmark (5 fixtures ship today across SQLi / command injection / hardcoded secrets / path traversal / XSS, but a production-quality precision-recall sweep needs more); a LanceDB-backed vector store as a drop-in for the SQLite path (documented in ADR-0004) when a deployment outgrows brute-force cosine. The languagetool prose linter ships behind an opt-in LANGUAGETOOL_URL (HTTP API, no JVM dep); a youki-based pure-Rust sandbox is documented as future-work in ADR-0002 — not blocking today since podman OR docker apply the same hardening flag set.

Production sandbox

For internet-facing deploys, set AR_SANDBOX_IMAGE to point at the hardened linter image (deploy/Dockerfile.sandbox). Linter spawns go through podman run --network=none --read-only --cap-drop=ALL --security-opt=no-new-privileges --memory=… --cpus=… --pids-limit=… --user 65534:65534 -v <repo>:/work:ro. Without this set, the gateway still works but logs a sandbox: direct (NO ISOLATION) warning — fine for a local LAN trial, not safe for any internet-reachable deploy. (Background: an unjailed linter is the exact path the Kudelski writeup used to reach RCE on CodeRabbit.)

Architecture (one-paragraph)

A Forgejo webhook lands at the gateway, which enqueues a job for the orchestrator. The orchestrator runs a per-PR review pipeline: clone → triage → static-analysis fan-out → context curation (tree-sitter symbols + in-memory cosine-similarity over the learnings store) → review generation → verification (drop unfounded findings) → severity-floor filter → post review. All untrusted execution (linters, LLM-issued workspace tools) runs in a Podman sandbox. LLM calls go through a pluggable provider abstraction that today ships an OpenAI-compatible client (works against hosted OpenAI, Ollama, vLLM, OpenRouter, Together, Groq, etc.).

Crates

Crate Purpose
ar-gateway HTTP webhook intake; HMAC verification; job enqueue
ar-orchestrator Per-PR state machine; activity dispatch
ar-forgejo Forgejo REST client
ar-llm LLM provider trait + implementations
ar-index Tree-sitter parsers + embeddings + co-change graph + learnings store
ar-tools Static-analysis runners + result normalization (45 linters)
ar-sandbox Podman / docker linter sandbox
ar-prompts Prompt templates and JSON schemas
ar-review Review pipeline activities
ar-chat Agentic @auto_review chat handler
ar-cli Operator CLI (16 subcommands; see crates/ar-cli/README.md)

License

AGPL-3.0-or-later. The intent is to keep this codebase open: anyone can self-host, modify, or fork, but a hosted-service operator must publish their modifications. See LICENSE.

Acknowledgements

Architectural lineage from public CodeRabbit engineering writing and from Qodo PR-Agent (Apache-2.0). Specific prompt patterns and the __new hunk__ / __old hunk__ diff format are adapted from PR-Agent under attribution.