feat(review): attribute per-review LLM costs #261

Merged
jwilger merged 19 commits from issue-28-cost-per-review into main 2026-05-18 22:19:33 -07:00
Owner

Why:

  • Review history needs per-review LLM cost attribution so operators can understand and tune review spend.
  • See issue #28.

What:

  • Add LLM pricing defaults, provider-qualified price overrides, routed usage capture, and embedding usage capture.
  • Append an LLM usage/cost footer to posted reviews when usage and pricing are available.
  • Expose estimated review cost on ReviewOutcome and persist it to SQLite review history.
  • Support AR_PRICE_TABLE_PATH for operator pricing overrides and AR_REVIEW_COST_FOOTER=false to suppress the public footer while preserving persisted cost attribution.
  • Document review cost attribution settings and systemd/deployment examples.
  • Move configured subagents from gpt-5.3-codex-spark to standard gpt-5.3-codex per branch request.

Validation:

  • cargo test -p ar-orchestrator --no-run
  • cargo clippy -p ar-review --tests -- -D warnings
  • cargo test -p ar-review review_pull_request_posts_review_with_llm_usage_cost_footer -- --nocapture
  • cargo nextest run -p ar-orchestrator run_review_job_records_review_outcome_cost_in_sqlite_history
  • cargo test -p ar-review review_pull_request_omits_llm_usage_cost_footer_when_disabled_by_env
  • cargo test -p ar-review review_pull_request_cost_footer_uses_price_table_override_from_env_path
  • python -m json.tool opencode.json
  • just opencode-test
  • just fmt
  • just clippy
  • just test
  • just ci

Closes #28

Why: - Review history needs per-review LLM cost attribution so operators can understand and tune review spend. - See issue #28. What: - Add LLM pricing defaults, provider-qualified price overrides, routed usage capture, and embedding usage capture. - Append an LLM usage/cost footer to posted reviews when usage and pricing are available. - Expose estimated review cost on `ReviewOutcome` and persist it to SQLite review history. - Support `AR_PRICE_TABLE_PATH` for operator pricing overrides and `AR_REVIEW_COST_FOOTER=false` to suppress the public footer while preserving persisted cost attribution. - Document review cost attribution settings and systemd/deployment examples. - Move configured subagents from `gpt-5.3-codex-spark` to standard `gpt-5.3-codex` per branch request. Validation: - `cargo test -p ar-orchestrator --no-run` - `cargo clippy -p ar-review --tests -- -D warnings` - `cargo test -p ar-review review_pull_request_posts_review_with_llm_usage_cost_footer -- --nocapture` - `cargo nextest run -p ar-orchestrator run_review_job_records_review_outcome_cost_in_sqlite_history` - `cargo test -p ar-review review_pull_request_omits_llm_usage_cost_footer_when_disabled_by_env` - `cargo test -p ar-review review_pull_request_cost_footer_uses_price_table_override_from_env_path` - `python -m json.tool opencode.json` - `just opencode-test` - `just fmt` - `just clippy` - `just test` - `just ci` Closes #28
Why:
- Operators need stable default rates and overrideable model pricing before per-review cost attribution can calculate estimates.

What:
- Add an ar-llm price table with pinned OpenAI defaults and JSON override loading.

Validation:
- cargo nextest run -p ar-llm openai_price_table_has_defaults_and_operator_overrides
Why:
- Per-review cost attribution needs a scoped way to observe token usage from routed LLM calls.

What:
- Add an optional Router usage collector and record completion and embedding calls by tier.

Validation:
- cargo nextest run -p ar-llm router_usage_collector_records_complete_and_embedding_calls
Why:
- Per-review cost reporting needs model pricing to convert token usage into an estimated USD value.

What:
- Add provider/model-aware price estimation with provider-qualified override precedence.

Validation:
- cargo nextest run -p ar-llm price_table_estimates_usage_by_provider_and_model estimate_usage_uses_provider_qualified_override_before_model_fallback openai_price_table_has_defaults_and_operator_overrides
Why:
- Per-review cost attribution must price usage by provider and model instead of placeholder tier labels.

What:
- Add LlmProvider metadata hooks and have Router pass provider base URL and model names to usage collectors.

Validation:
- cargo nextest run -p ar-llm router_usage_collector_records_provider_and_model_names
Why:
- Per-review cost attribution needs embedding token counts from provider responses, not zeroed placeholder usage.

What:
- Add provider embed_with_usage plumbing and parse OpenAI-compatible embedding usage into routed usage collection.

Validation:
- cargo test -p ar-llm --test usage_capture router_usage_collector_records_embedding_prompt_tokens_from_openai_response
Why:
- Review authors and operators need per-review visibility into token usage and estimated cost where the bot posts its findings.

What:
- Collect routed LLM usage during review generation and append an estimated usage/cost footer before posting the review.
- Chain usage collectors so pipeline-local accounting preserves existing observers.

Validation:
- cargo nextest run -p ar-review review_pull_request_posts_review_with_llm_usage_cost_footer
Why:
- Keep the issue 28 LLM cost and usage helpers aligned with workspace formatting before continuing behavior work.

What:
- Apply formatting-only cleanup to the previously approved ar-llm pricing and usage changes.

Validation:
- cargo nextest run -p ar-review review_pull_request_posts_review_with_llm_usage_cost_footer
Why:
- Review-history storage needs a place to persist per-review cost aggregates for issue 28.

What:
- Add a per-review cost column to the SQLite review_history table with focused coverage.

Validation:
- cargo test -p ar-orchestrator persist_sha_with_per_review_cost_aggregate -- --nocapture
Why:
- Review history needs to store the actual per-review cost estimate, not only a placeholder column.

What:
- Add a SQLite history helper to record a concrete per-review cost with the reviewed SHA.

Validation:
- cargo test -p ar-orchestrator caller_can_record_explicit_per_review_cost_and_read_it_back -- --nocapture
Why:\n- Orchestrator history needs the actual per-review LLM cost computed by the review pipeline.\n\nWhat:\n- Return the cost footer total on ReviewOutcome and update compile fallout.\n- Clean up lint-only usage capture types and unused test bindings.\n\nValidation:\n- cargo test -p ar-orchestrator --no-run\n- cargo clippy -p ar-review --tests -- -D warnings\n- cargo test -p ar-review review_pull_request_posts_review_with_llm_usage_cost_footer -- --nocapture
Why:\n- Review history should attribute the actual cost computed for each successful review instead of relying on the SQLite default.\n\nWhat:\n- Add a cost-aware ReviewHistory recording method with a default fallback.\n- Wire successful review jobs to persist ReviewOutcome estimated cost through SQLite history.\n\nValidation:\n- cargo nextest run -p ar-orchestrator run_review_job_records_review_outcome_cost_in_sqlite_history
Why:\n- The issue branch should run specialist subagents on gpt-5.3-codex with standard reasoning instead of the spark variant.\n\nWhat:\n- Add a standard gpt-5.3-codex model alias and point affected subagents at it.\n- Mirror the model choice in subagent frontmatter.\n\nValidation:\n- python -m json.tool opencode.json\n- just opencode-test
Why:\n- Operators may want per-review cost attribution persisted without adding a public usage footer to every review comment.\n\nWhat:\n- Honor AR_REVIEW_COST_FOOTER=false by skipping the LLM usage/cost footer while keeping the default enabled.\n\nValidation:\n- cargo test -p ar-review review_pull_request_omits_llm_usage_cost_footer_when_disabled_by_env
Why:\n- The disabled footer contract should cover both posted review text and the returned cost outcome.\n\nWhat:\n- Assert AR_REVIEW_COST_FOOTER=false yields a zero estimated review cost.\n\nValidation:\n- cargo test -p ar-review review_pull_request_omits_llm_usage_cost_footer_when_disabled_by_env -- --nocapture
Why:\n- Operators need per-review cost attribution to reflect their configured model pricing instead of built-in defaults.\n\nWhat:\n- Honor AR_PRICE_TABLE_PATH when estimating review footer and outcome costs, falling back to defaults if absent or invalid.\n- Cover the override path with a focused review pipeline test.\n\nValidation:\n- cargo nextest run -p ar-review review_pull_request_cost_footer_uses_price_table_override_from_env_path --no-fail-fast\n- cargo test -p ar-review review_pull_request_omits_llm_usage_cost_footer_when_disabled_by_env -- --nocapture\n- cargo clippy -p ar-review --tests -- -D warnings
Why:\n- Per-review cost attribution should remain available for history even when operators suppress the public review footer.\n\nWhat:\n- Compute the review cost with the same pricing path while leaving the posted review body unchanged when AR_REVIEW_COST_FOOTER=false.\n\nValidation:\n- cargo test -p ar-review review_pull_request_omits_llm_usage_cost_footer_when_disabled_by_env\n- cargo test -p ar-review review_pull_request_cost_footer_uses_price_table_override_from_env_path
Why:\n- Operators need to know how per-review cost attribution is persisted and how to tune pricing/footer visibility.\n\nWhat:\n- Document AR_PRICE_TABLE_PATH, AR_REVIEW_COST_FOOTER, and per_review_cost_usd behavior.\n- Add deployment and systemd env examples for cost attribution settings.\n\nValidation:\n- docs-operator-reviewer approved the operator docs diff
Why:\n- The issue branch should satisfy the workspace formatting gate before broader verification.\n\nWhat:\n- Apply cargo fmt output for the review cost attribution tests.\n\nValidation:\n- cargo fmt --all
test(llm): satisfy clippy in cost tests
All checks were successful
CI / Classify changed paths (pull_request) Successful in 3s
CI / Clippy (pull_request) Has been skipped
CI / Format check (pull_request) Has been skipped
CI / Test (pull_request) Has been skipped
CI / Dependency policy (pull_request) Has been skipped
CI / Build (pull_request) Has been skipped
CI / Request auto_review semantic review (pull_request) Successful in 1s
CI / opencode plugin tests (pull_request) Successful in 8s
CI / Build PR artifacts (no token) (pull_request) Has been skipped
auto_review auto_review: no findings
919313eaed
Why:\n- The issue branch should pass the workspace clippy gate with warnings denied.\n\nWhat:\n- Remove redundant PathBuf and vec conversions in LLM cost usage tests.\n\nValidation:\n- just clippy
auto-review left a comment

This PR introduces per-review LLM cost attribution, allowing operators to understand and manage review expenses. It includes changes to capture and estimate LLM usage costs, append cost footers to reviews, and persist cost data in SQLite. The changes appear well-tested and safe to merge.

Walkthrough

  • LLM Cost Attribution:
    • Added functionality to estimate and record LLM usage costs per review.
    • Introduced environment variables AR_PRICE_TABLE_PATH and AR_REVIEW_COST_FOOTER for cost management.
    • Updated documentation to reflect new configuration options.
  • Code Changes:
    • Modified Router to include usage collection.
    • Added record_with_cost method to ReviewHistory and its implementations.
    • Updated tests to cover new cost attribution features.
  • Documentation:
    • Updated deployment and operations guides to include new cost attribution settings.
    • Provided examples for setting up cost attribution in systemd environments.

LLM usage and cost

This PR introduces per-review LLM cost attribution, allowing operators to understand and manage review expenses. It includes changes to capture and estimate LLM usage costs, append cost footers to reviews, and persist cost data in SQLite. The changes appear well-tested and safe to merge. ## Walkthrough - **LLM Cost Attribution**: - Added functionality to estimate and record LLM usage costs per review. - Introduced environment variables `AR_PRICE_TABLE_PATH` and `AR_REVIEW_COST_FOOTER` for cost management. - Updated documentation to reflect new configuration options. - **Code Changes**: - Modified `Router` to include usage collection. - Added `record_with_cost` method to `ReviewHistory` and its implementations. - Updated tests to cover new cost attribution features. - **Documentation**: - Updated deployment and operations guides to include new cost attribution settings. - Provided examples for setting up cost attribution in systemd environments. ## LLM usage and cost - Reasoning (gpt-4o) in=21391 out=361 cost=$0.112370 - Cheap (gpt-4o-mini) in=609 out=51 cost=$0.000122 Estimated total USD: $0.112492 via https://api.openai.com and https://api.openai.com
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Slipstream/auto_review!261
No description provided.