fix(nixos): harden gateway module for production #271

Merged
jwilger merged 16 commits from fix/nixos-production-ready into main 2026-05-19 11:28:41 -07:00
Owner

Why:

  • The NixOS module should be production-ready for the 1.0 direct-host deployment path instead of only installing a minimal root-running service.
  • Durable embedded OCI persistence remains separate follow-up work in issue #270.

What:

  • Run the generated gateway service as the dedicated auto_review user/group and provision that account.
  • Bind the gateway to 127.0.0.1:8080 by default for reverse-proxy/TLS deployments.
  • Add direct-host systemd hardening, restart/stop behavior, runtime/state directory modes, write paths, resource limits, and journald identity.
  • Extend the NixOS module contract check to pin the production service shape.
  • Update deployment docs for the hardened NixOS bare/systemd path.

Validation:

  • nix build .#checks.x86_64-linux.auto-review-nixos-module-contract
  • just fmt
  • nix flake check
Why: - The NixOS module should be production-ready for the 1.0 direct-host deployment path instead of only installing a minimal root-running service. - Durable embedded OCI persistence remains separate follow-up work in issue #270. What: - Run the generated gateway service as the dedicated `auto_review` user/group and provision that account. - Bind the gateway to `127.0.0.1:8080` by default for reverse-proxy/TLS deployments. - Add direct-host systemd hardening, restart/stop behavior, runtime/state directory modes, write paths, resource limits, and journald identity. - Extend the NixOS module contract check to pin the production service shape. - Update deployment docs for the hardened NixOS bare/systemd path. Validation: - `nix build .#checks.x86_64-linux.auto-review-nixos-module-contract` - `just fmt` - `nix flake check`
Why:
- The NixOS gateway module is intended to be production usable, but the service previously ran as root by default.

What:
- Set the generated gateway service User and Group to auto_review.
- Pin the NixOS module contract for the non-root service identity.

Validation:
- nix build .#checks.x86_64-linux.auto-review-nixos-module-contract
Why:
- The NixOS gateway service now runs as auto_review, so the module must also declare that service account for reproducible deployments.

What:
- Create the auto_review system group and user when the gateway service is enabled.
- Extend the NixOS module contract to pin account provisioning.

Validation:
- nix build .#checks.x86_64-linux.auto-review-nixos-module-contract
Why:
- Production NixOS deployments should expect a reverse proxy or TLS boundary instead of exposing the gateway on all interfaces by default.

What:
- Set AR_GATEWAY_BIND to 127.0.0.1:8080 for the generated gateway service.
- Extend the NixOS module contract to pin the loopback default.

Validation:
- nix build .#checks.x86_64-linux.auto-review-nixos-module-contract
Why:
- The NixOS gateway module should provide the same direct-host defense-in-depth baseline as the production systemd unit.

What:
- Add the baseline systemd hardening directives to the generated gateway service.
- Extend the NixOS module contract to prevent hardening drift.

Validation:
- nix build .#checks.x86_64-linux.auto-review-nixos-module-contract
Why:
- Production NixOS deployments need the same restart, state, logging, and resource-control behavior as the direct-host systemd unit.

What:
- Add runtime/state directory modes, restart/stop behavior, write paths, resource limits, and journald identity to the generated gateway service.
- Extend the NixOS module contract for these operational controls.

Validation:
- nix build .#checks.x86_64-linux.auto-review-nixos-module-contract
docs: describe hardened NixOS gateway service
All checks were successful
CI / Test (pull_request) Successful in 54s
CI / Classify changed paths (pull_request) Successful in 3s
CI / Format check (pull_request) Successful in 6s
CI / Clippy (pull_request) Successful in 40s
CI / opencode plugin tests (pull_request) Has been skipped
CI / Dependency policy (pull_request) Successful in 12s
CI / Request auto_review semantic review (pull_request) Successful in 1s
auto_review auto_review: 1 warning
CI / Build (pull_request) Successful in 33s
CI / Build PR artifacts (no token) (pull_request) Has been skipped
a7011774af
Why:
- Operators need deployment docs to match the now-hardened NixOS service behavior and its bare/systemd posture.

What:
- Document the NixOS module's service account, loopback bind, persistent state, service controls, and hardening baseline.
- Clarify that durable embedded OCI persistence remains separate work.

Validation:
- nix build .#checks.x86_64-linux.auto-review-nixos-module-contract
- just fmt
auto-review approved these changes 2026-05-19 08:55:14 -07:00
Dismissed
auto-review left a comment

This PR enhances the NixOS module for production readiness by running the gateway service under a dedicated user, binding it to localhost, and applying systemd hardening measures. The changes appear well-structured and safe to merge.

Walkthrough

  • docs/DEPLOYMENT.md:
    • Updated to reflect the new service configuration, including user creation and binding details.
  • flake.nix:
    • Introduced a dedicated auto_review user and group for running the service.
    • Configured systemd service with hardening options such as NoNewPrivileges, ProtectSystem, and RestrictAddressFamilies.
    • Set default binding to 127.0.0.1:8080 for reverse proxy setups.
    • Added assertions to ensure the service configuration meets production standards.

LLM usage and cost

This PR enhances the NixOS module for production readiness by running the gateway service under a dedicated user, binding it to localhost, and applying systemd hardening measures. The changes appear well-structured and safe to merge. ## Walkthrough - **docs/DEPLOYMENT.md**: - Updated to reflect the new service configuration, including user creation and binding details. - **flake.nix**: - Introduced a dedicated `auto_review` user and group for running the service. - Configured systemd service with hardening options such as `NoNewPrivileges`, `ProtectSystem`, and `RestrictAddressFamilies`. - Set default binding to `127.0.0.1:8080` for reverse proxy setups. - Added assertions to ensure the service configuration meets production standards. ## LLM usage and cost - Reasoning (gpt-4o) in=6234 out=335 cost=$0.036195 - Cheap (gpt-4o-mini) in=2782 out=72 cost=$0.000461 Estimated total USD: $0.036656 via https://api.openai.com and https://api.openai.com
@ -66,15 +66,65 @@
})
(lib.mkIf gatewayCfg.enable {
users.groups.auto_review = { };
Owner

🟡 Warning: Ensure that the auto_review user and group are created before the service starts to avoid runtime errors.

🟡 **Warning:** Ensure that the `auto_review` user and group are created before the service starts to avoid runtime errors.
Author
Owner

@auto-review Addressed. The module already declares users.groups.auto_review and users.users.auto_review when the gateway is enabled, so NixOS provisions the account as part of system activation before systemd starts the service. I also strengthened the module contract to assert the declared account is a system user in the auto_review group.

@auto-review Addressed. The module already declares `users.groups.auto_review` and `users.users.auto_review` when the gateway is enabled, so NixOS provisions the account as part of system activation before systemd starts the service. I also strengthened the module contract to assert the declared account is a system user in the `auto_review` group.
Author
Owner

@auto-review Follow-up before pushing: I am updating the project guardrails now so future PR feedback handling must reply on the existing inline review thread via the review comments reply endpoint, rather than creating a top-level PR comment or a new inline thread. This context is posted before the push so the next auto-review run can see it.

@auto-review Follow-up before pushing: I am updating the project guardrails now so future PR feedback handling must reply on the existing inline review thread via the review comments reply endpoint, rather than creating a top-level PR comment or a new inline thread. This context is posted before the push so the next auto-review run can see it.
jwilger marked this conversation as resolved
test(nixos): assert gateway account shape
All checks were successful
CI / Classify changed paths (pull_request) Successful in 3s
CI / Format check (pull_request) Successful in 6s
CI / Clippy (pull_request) Successful in 47s
CI / opencode plugin tests (pull_request) Has been skipped
CI / Request auto_review semantic review (pull_request) Successful in 3s
auto_review auto_review: no findings
CI / Dependency policy (pull_request) Successful in 13s
CI / Test (pull_request) Successful in 1m3s
CI / Build (pull_request) Successful in 42s
CI / Build PR artifacts (no token) (pull_request) Has been skipped
870de2fcff
Why:
- PR feedback asked to ensure the auto_review user and group exist before the service starts.
- The module already declares them, but the contract should make the service-account shape explicit.

What:
- Strengthen the NixOS module contract to assert auto_review is a system user in the auto_review group.

Validation:
- nix build .#checks.x86_64-linux.auto-review-nixos-module-contract
- just fmt
jwilger dismissed auto-review's review 2026-05-19 09:15:23 -07:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

auto-review approved these changes 2026-05-19 09:16:49 -07:00
Dismissed
auto-review left a comment

The PR enhances the NixOS module by adding a dedicated auto_review user/group and hardening the gateway service for production. The changes appear safe to merge, with no major issues identified.

Walkthrough

Δ since a701177:

  • flake.nix:
    • Added gatewayServiceAccount to verify the presence of a dedicated auto_review user/group.
    • Strengthened the module contract to ensure the auto_review user is a system user and belongs to the auto_review group.

LLM usage and cost

The PR enhances the NixOS module by adding a dedicated `auto_review` user/group and hardening the gateway service for production. The changes appear safe to merge, with no major issues identified. ## Walkthrough ### Δ since a701177: - **flake.nix**: - Added `gatewayServiceAccount` to verify the presence of a dedicated `auto_review` user/group. - Strengthened the module contract to ensure the `auto_review` user is a system user and belongs to the `auto_review` group. ## LLM usage and cost - Reasoning (gpt-4o) in=4667 out=184 cost=$0.026095 - Cheap (gpt-4o-mini) in=461 out=44 cost=$0.000096 Estimated total USD: $0.026191 via https://api.openai.com and https://api.openai.com
fix(opencode): enforce inline feedback replies
All checks were successful
CI / Clippy (pull_request) Successful in 46s
CI / Classify changed paths (pull_request) Successful in 3s
CI / Format check (pull_request) Successful in 6s
CI / opencode plugin tests (pull_request) Successful in 14s
CI / Test (pull_request) Successful in 1m3s
CI / Dependency policy (pull_request) Successful in 12s
CI / Request auto_review semantic review (pull_request) Successful in 1s
auto_review auto_review: no findings
CI / Build (pull_request) Successful in 38s
CI / Build PR artifacts (no token) (pull_request) Successful in 2s
0aa7504242
Why:
- PR feedback responses must answer the existing Forgejo inline review thread instead of creating a top-level comment or a new inline thread.

What:
- Tighten the Forgejo feedback rule, skill, agent, and command instructions around the review comments reply endpoint.
- Block shell attempts to use top-level issue comment endpoints for inline feedback replies while allowing existing review-thread replies.

Validation:
- just opencode-test
jwilger dismissed auto-review's review 2026-05-19 09:19:21 -07:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

auto-review approved these changes 2026-05-19 09:20:47 -07:00
Dismissed
auto-review left a comment

The PR enhances the NixOS module for production readiness by running the gateway service under a dedicated user, binding it to localhost, and adding systemd hardening measures. It also updates the feedback processing rules to ensure replies are made on existing threads, preventing new top-level comments.

Walkthrough

  • NixOS Module Changes: The gateway service now runs as a dedicated auto_review user, with systemd hardening and resource limits applied.
  • Feedback Processing: Updated rules and tests to ensure that replies to feedback are made on existing threads, preventing the creation of new top-level comments or threads.
  • Testing: Added tests to verify that top-level comments are blocked and inline replies are allowed.

LLM usage and cost

The PR enhances the NixOS module for production readiness by running the gateway service under a dedicated user, binding it to localhost, and adding systemd hardening measures. It also updates the feedback processing rules to ensure replies are made on existing threads, preventing new top-level comments. ## Walkthrough - **NixOS Module Changes**: The gateway service now runs as a dedicated `auto_review` user, with systemd hardening and resource limits applied. - **Feedback Processing**: Updated rules and tests to ensure that replies to feedback are made on existing threads, preventing the creation of new top-level comments or threads. - **Testing**: Added tests to verify that top-level comments are blocked and inline replies are allowed. ## LLM usage and cost - Reasoning (gpt-4o) in=6511 out=278 cost=$0.036725 - Cheap (gpt-4o-mini) in=461 out=50 cost=$0.000099 Estimated total USD: $0.036824 via https://api.openai.com and https://api.openai.com
Why:
- The auto_review gateway URL is deployment configuration, not credential material, while the action token remains secret.
- The repository moved under the Slipstream owner, so workflows and operator docs should target the production owner.

What:
- Read AUTO_REVIEW_GATEWAY_URL from Forgejo Actions variables instead of secrets.
- Update workflow release metadata and maintainer/operator documentation from jwilger/auto_review to Slipstream/auto_review.

Validation:
- just opencode-test
- just fmt
- git diff --check
test(actions): expect Slipstream release API path
Some checks failed
CI / Classify changed paths (pull_request) Failing after 3s
CI / Clippy (pull_request) Has been skipped
CI / Format check (pull_request) Has been skipped
CI / opencode plugin tests (pull_request) Has been skipped
CI / Test (pull_request) Has been skipped
CI / Build (pull_request) Has been skipped
CI / Dependency policy (pull_request) Has been skipped
CI / Request auto_review semantic review (pull_request) Has been skipped
CI / Build PR artifacts (no token) (pull_request) Has been skipped
30fc9c36f3
Why:
- The upstream repository owner changed to Slipstream, so the workflow contract should pin the new Forgejo API path.

What:
- Update the release-prepare workflow contract assertion from jwilger/auto_review to Slipstream/auto_review.

Validation:
- cargo nextest run -p ar-gateway release_prepare_uses_forgejo_api_json_for_open_pr_lookup
- just fmt
jwilger dismissed auto-review's review 2026-05-19 10:53:53 -07:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

fix(actions): fetch base ref for PR path classification
Some checks failed
CI / Classify changed paths (pull_request) Failing after 3s
CI / Format check (pull_request) Has been skipped
CI / Clippy (pull_request) Has been skipped
CI / opencode plugin tests (pull_request) Has been skipped
CI / Test (pull_request) Has been skipped
CI / Build (pull_request) Has been skipped
CI / Dependency policy (pull_request) Has been skipped
CI / Request auto_review semantic review (pull_request) Has been skipped
CI / Build PR artifacts (no token) (pull_request) Has been skipped
fa75c971db
Why:
- After the repository owner transfer, the classifier job failed while fetching raw base/head SHAs under the runner checkout context.
- Fetching the base branch ref is more portable across Forgejo owner changes and still lets the classifier diff against the PR head.

What:
- Fetch the PR base ref in path classification and use the event base SHA when available, with a fallback to origin/<base-ref>.
- Extend the workflow contract to avoid the brittle raw-SHA fetch pattern.

Validation:
- cargo nextest run -p ar-gateway pr_ci_classifier_treats_root_opencode_json_as_opencode_and_fails_closed_to_app_ci_on_uncertain_diff
- just fmt
- git diff --check
fix(actions): grant PR read permission
Some checks failed
CI / Classify changed paths (pull_request) Failing after 3s
CI / Format check (pull_request) Has been skipped
CI / Clippy (pull_request) Has been skipped
CI / opencode plugin tests (pull_request) Has been skipped
CI / Test (pull_request) Has been skipped
CI / Build (pull_request) Has been skipped
CI / Dependency policy (pull_request) Has been skipped
CI / Request auto_review semantic review (pull_request) Has been skipped
CI / Build PR artifacts (no token) (pull_request) Has been skipped
38b2f2e0c3
Why:
- After the repository moved under Slipstream, the PR classifier appears to fail during authenticated checkout or PR metadata access.
- The workflow should explicitly grant the job token both repository contents and pull request read access.

What:
- Add pull-requests: read to CI workflow permissions.
- Pin the permission requirement in the CI workflow contract.

Validation:
- cargo nextest run -p ar-gateway pr_ci_exposes_separate_just_based_deterministic_jobs pr_ci_classifier_treats_root_opencode_json_as_opencode_and_fails_closed_to_app_ci_on_uncertain_diff
- just fmt
- git diff --check
fix(actions): avoid auth in path classifier
All checks were successful
CI / Classify changed paths (pull_request) Successful in 1s
CI / Format check (pull_request) Successful in 6s
CI / Clippy (pull_request) Successful in 41s
CI / opencode plugin tests (pull_request) Successful in 13s
CI / Test (pull_request) Successful in 53s
CI / Dependency policy (pull_request) Successful in 9s
CI / Request auto_review semantic review (pull_request) Successful in 2s
CI / Build (pull_request) Successful in 31s
CI / Build PR artifacts (no token) (pull_request) Successful in 2s
6a47faf239
Why:
- After the repository owner transfer, the first PR job still fails before downstream gates can run, consistent with checkout/auth failing in path classification.
- The classifier should emit safe outputs without depending on checkout or fetch.

What:
- Make path classification fail closed to both app and opencode CI without checkout/fetch.
- Update the workflow contract and remove the now-unused shell case helper.

Validation:
- cargo nextest run -p ar-gateway pr_ci_classifier_treats_root_opencode_json_as_opencode_and_fails_closed_to_app_ci_on_uncertain_diff
- just fmt
fix(actions): classify paths through Forgejo API
All checks were successful
CI / Classify changed paths (pull_request) Successful in 1s
CI / Format check (pull_request) Successful in 6s
CI / Clippy (pull_request) Successful in 43s
CI / opencode plugin tests (pull_request) Successful in 13s
CI / Test (pull_request) Successful in 56s
CI / Dependency policy (pull_request) Successful in 9s
CI / Request auto_review semantic review (pull_request) Successful in 2s
CI / Build (pull_request) Successful in 33s
CI / Build PR artifacts (no token) (pull_request) Successful in 2s
cc4afbd0b6
Why:
- The PR path classifier should avoid git checkout/fetch auth issues after the repository owner transfer without losing path-based routing.

What:
- Fetch changed PR files from the Forgejo pull files API using the workflow token.
- Keep a fail-closed fallback to run all gates when the API lookup fails.
- Restore contract coverage for opencode path classification without git fetch.

Validation:
- cargo nextest run -p ar-gateway pr_ci_classifier_treats_root_opencode_json_as_opencode_and_fails_closed_to_app_ci_on_uncertain_diff
- just fmt
- git diff --check
fix(actions): clarify semantic review auth failures
Some checks failed
CI / Classify changed paths (pull_request) Successful in 1s
CI / Format check (pull_request) Successful in 6s
CI / Request auto_review semantic review (pull_request) Successful in 1s
CI / Build (pull_request) Has been cancelled
CI / Clippy (pull_request) Successful in 43s
CI / opencode plugin tests (pull_request) Successful in 14s
CI / Test (pull_request) Successful in 57s
CI / Dependency policy (pull_request) Successful in 10s
CI / Build PR artifacts (no token) (pull_request) Has been cancelled
fc40f6190c
Why:
- After the repository owner transfer, the semantic-review job can reach production but fail with HTTP 401 when the Slipstream token secret does not match the gateway token.

What:
- Report 401/403 responses with an explicit instruction to verify AUTO_REVIEW_ACTION_TOKEN against AR_CI_REVIEW_TOKEN.
- Document that the gateway URL is an Actions variable and the token is a Slipstream/auto_review secret.

Validation:
- cargo nextest run -p ar-gateway pr_ci_exposes_separate_just_based_deterministic_jobs
- just fmt
- git diff --check
fix(actions): use AR_CI_REVIEW_TOKEN secret
All checks were successful
CI / Classify changed paths (pull_request) Successful in 1s
CI / Clippy (pull_request) Successful in 40s
auto_review auto_review: 3 warnings
CI / Build (pull_request) Successful in 31s
CI / Build PR artifacts (no token) (pull_request) Successful in 2s
CI / Format check (pull_request) Successful in 6s
CI / opencode plugin tests (pull_request) Successful in 12s
CI / Test (pull_request) Successful in 54s
CI / Dependency policy (pull_request) Successful in 9s
CI / Request auto_review semantic review (pull_request) Successful in 1s
0a443d9875
Why:
- The repository workflow secret now matches the gateway variable name, AR_CI_REVIEW_TOKEN.

What:
- Read semantic-review bearer credentials from secrets.AR_CI_REVIEW_TOKEN.
- Update operator docs and examples to use the same secret name.

Validation:
- cargo nextest run -p ar-gateway pr_ci_exposes_separate_just_based_deterministic_jobs
- just fmt
- git diff --check
auto-review left a comment

The PR enhances the NixOS module for production readiness by running the gateway service under a dedicated user, binding it to localhost, and applying systemd hardening. It also updates CI workflows to use Forgejo API for path classification and adjusts repository references. The changes appear safe to merge, with some recommendations for ensuring correct configuration and testing.

Walkthrough

  • CI Workflow Changes:
    • Updated to use Forgejo API for path classification, reducing reliance on git fetch.
    • Added pull-requests: read permission to access PR metadata.
    • Adjusted token usage to ensure proper authorization.
  • NixOS Module Enhancements:
    • Runs the gateway service as a dedicated auto_review user/group.
    • Binds the service to 127.0.0.1:8080 for security.
    • Applies systemd hardening measures for production readiness.
  • Documentation and Repository Updates:
    • Updated repository references from jwilger to Slipstream.
    • Adjusted documentation to reflect changes in deployment and operational procedures.

LLM usage and cost

The PR enhances the NixOS module for production readiness by running the gateway service under a dedicated user, binding it to localhost, and applying systemd hardening. It also updates CI workflows to use Forgejo API for path classification and adjusts repository references. The changes appear safe to merge, with some recommendations for ensuring correct configuration and testing. ## Walkthrough - **CI Workflow Changes**: - Updated to use Forgejo API for path classification, reducing reliance on `git fetch`. - Added `pull-requests: read` permission to access PR metadata. - Adjusted token usage to ensure proper authorization. - **NixOS Module Enhancements**: - Runs the gateway service as a dedicated `auto_review` user/group. - Binds the service to `127.0.0.1:8080` for security. - Applies systemd hardening measures for production readiness. - **Documentation and Repository Updates**: - Updated repository references from `jwilger` to `Slipstream`. - Adjusted documentation to reflect changes in deployment and operational procedures. ## LLM usage and cost - Reasoning (gpt-4o) in=13073 out=667 cost=$0.075370 - Cheap (gpt-4o-mini) in=10138 out=228 cost=$0.001657 Estimated total USD: $0.077028 via https://api.openai.com and https://api.openai.com
@ -329,25 +343,37 @@ jobs:
needs: [path-classification]
Owner

🟡 Warning: Ensure that the AR_CI_REVIEW_TOKEN secret is correctly configured and matches the production token to avoid authorization failures.

🟡 **Warning:** Ensure that the `AR_CI_REVIEW_TOKEN` secret is correctly configured and matches the production token to avoid authorization failures.
jwilger marked this conversation as resolved
Owner

🟡 Warning: Lines 176–182: Ensure that the Forgejo PR files API is correctly used with the workflow token to avoid issues with path classification.

🟡 **Warning:** **Lines 176–182:** Ensure that the Forgejo PR files API is correctly used with the workflow token to avoid issues with path classification.
jwilger marked this conversation as resolved
Owner

🟡 Warning: Lines 201–218: Ensure that the path-classification logic does not rely on git fetch to avoid unnecessary operations and potential errors.

🟡 **Warning:** **Lines 201–218:** Ensure that the path-classification logic does not rely on `git fetch` to avoid unnecessary operations and potential errors.
jwilger marked this conversation as resolved
jwilger deleted branch fix/nixos-production-ready 2026-05-19 11:28:41 -07:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Slipstream/auto_review!271
No description provided.