feat: verify data-flow source bits #24

Merged
jwilger merged 1 commit from feat/verify-data-flow-source-bits into main 2026-06-05 22:51:32 -07:00
Owner

Summary

  • Adds Lean4 and Quint obligations that verify modeled data-flow source hops preserve bit encodings when a data-flow source resolves to another modeled data-flow target.
  • Wires the new root invariant into generated quint.json, emc verify, and MCP verify process plans.
  • Extends canonical artifact checks so emc check reports drift when the obligation is missing from generated artifacts.

Rationale
Information completeness needs data-flow chains to preserve bit-level semantics across modeled transformations and projections. The formal artifacts already require bit-complete rows and target-level bit matching for displayed and external payload fields; this change adds a root-level proof obligation for source-hop consistency between modeled data-flow rows without adding Rust or JavaScript semantic validation.

Verification

  • cargo nextest run init_creates_deterministic_project_layout --no-tests=fail
  • cargo fmt --check && cargo nextest run -E 'binary(init_project) | binary(check_project) | binary(verify_project) | binary(mcp_verify)' --no-tests=fail && cargo build --quiet && git diff --check
  • nix develop -c sh -lc 'cd <generated-smoke-project> && lake env lean model/lean/RepairDesk.lean && quint typecheck model/quint/RepairDesk.qnt && quint verify --invariant modelDataFlowSourceBitEncodingsMatchModeledSources,modelDataFlowsAreBitComplete model/quint/RepairDesk.qnt'
  • Negative smoke: corrupting a second data-flow hop from UTF-8 string to UTF-16 string caused Lean to fail at modelDataFlowSourceBitEncodingsMatchModeledSources and Quint to report a counterexample for the same invariant.
Summary - Adds Lean4 and Quint obligations that verify modeled data-flow source hops preserve bit encodings when a data-flow source resolves to another modeled data-flow target. - Wires the new root invariant into generated `quint.json`, `emc verify`, and MCP verify process plans. - Extends canonical artifact checks so `emc check` reports drift when the obligation is missing from generated artifacts. Rationale Information completeness needs data-flow chains to preserve bit-level semantics across modeled transformations and projections. The formal artifacts already require bit-complete rows and target-level bit matching for displayed and external payload fields; this change adds a root-level proof obligation for source-hop consistency between modeled data-flow rows without adding Rust or JavaScript semantic validation. Verification - `cargo nextest run init_creates_deterministic_project_layout --no-tests=fail` - `cargo fmt --check && cargo nextest run -E 'binary(init_project) | binary(check_project) | binary(verify_project) | binary(mcp_verify)' --no-tests=fail && cargo build --quiet && git diff --check` - `nix develop -c sh -lc 'cd <generated-smoke-project> && lake env lean model/lean/RepairDesk.lean && quint typecheck model/quint/RepairDesk.qnt && quint verify --invariant modelDataFlowSourceBitEncodingsMatchModeledSources,modelDataFlowsAreBitComplete model/quint/RepairDesk.qnt'` - Negative smoke: corrupting a second data-flow hop from `UTF-8 string` to `UTF-16 string` caused Lean to fail at `modelDataFlowSourceBitEncodingsMatchModeledSources` and Quint to report a counterexample for the same invariant.
feat: verify data-flow source bits
All checks were successful
CI / Nix flake check (pull_request) Successful in 2m51s
CI / Request auto_review semantic review (pull_request) Successful in 2s
auto_review auto_review: 1 warning
CI / Rust CI (pull_request) Successful in 6m54s
dfbc8710c2
auto-review left a comment

The PR introduces a new verification obligation to ensure data-flow source hops preserve bit encodings, enhancing data integrity checks. The changes appear well-integrated into the existing verification processes, but careful review of the logic and edge cases is advised.

Walkthrough

  • src/core/layout.rs:
    • Added new functions modelDataFlowSourceBitEncodingMatchesModeledSource and modelDataFlowSourceBitEncodingsMatchModeledSources to verify bit encoding consistency.
    • These functions are critical for ensuring data integrity across data-flow transformations.
    • The logic involves checking if any source flow matches the workflow, slice, datum, and source with the same bit encoding.
  • src/core/project.rs:
    • Updated quint.json to include the new invariant modelDataFlowSourceBitEncodingsMatchModeledSources.
  • src/core/verify.rs:
    • Integrated the new invariant into the verification process.
  • tests/init_project.rs, tests/mcp_verify.rs, tests/verify_project.rs:
    • Added tests to ensure the new invariant is correctly applied and verified in both Lean and Quint models.

LLM usage and cost

The PR introduces a new verification obligation to ensure data-flow source hops preserve bit encodings, enhancing data integrity checks. The changes appear well-integrated into the existing verification processes, but careful review of the logic and edge cases is advised. ## Walkthrough - **src/core/layout.rs**: - Added new functions `modelDataFlowSourceBitEncodingMatchesModeledSource` and `modelDataFlowSourceBitEncodingsMatchModeledSources` to verify bit encoding consistency. - These functions are critical for ensuring data integrity across data-flow transformations. - The logic involves checking if any source flow matches the workflow, slice, datum, and source with the same bit encoding. - **src/core/project.rs**: - Updated `quint.json` to include the new invariant `modelDataFlowSourceBitEncodingsMatchModeledSources`. - **src/core/verify.rs**: - Integrated the new invariant into the verification process. - **tests/init_project.rs, tests/mcp_verify.rs, tests/verify_project.rs**: - Added tests to ensure the new invariant is correctly applied and verified in both Lean and Quint models. ## LLM usage and cost - Reasoning (gpt-4o) in=18427 out=510 cost=$0.099785 - Cheap (gpt-4o-mini) in=10248 out=88 cost=$0.001590 Estimated total USD: $0.101375 via https://api.openai.com and https://api.openai.com
Owner

🟡 Warning: Lines 6–14: Verify that the modelDataFlowSourceBitEncodingMatchesModeledSource function correctly handles cases where no source flow matches the criteria, as this could lead to false positives or negatives in verification.

🟡 **Warning:** **Lines 6–14:** Verify that the `modelDataFlowSourceBitEncodingMatchesModeledSource` function correctly handles cases where no source flow matches the criteria, as this could lead to false positives or negatives in verification.
Author
Owner

Addressed auto_review warning about edge cases:

  • Verified the new invariant with a generated model where the second data-flow row sources from the first row's modeled target (raw_ticket_title). Lean and Quint both accepted the valid UTF-8/UTF-8 chain.
  • Verified the negative case by corrupting the second hop to UTF-16 string; Lean rejected modelDataFlowSourceBitEncodingsMatchModeledSources and Quint produced a counterexample for the same invariant.
  • Current scope is exact modeled-target source references. Field-path source strings such as TicketCaptured.ticket_title still need a later schema-level representation change because Quint does not support constructing/comparing derived string coordinates in a way that would make that robust.
Addressed auto_review warning about edge cases: - Verified the new invariant with a generated model where the second data-flow row sources from the first row's modeled target (`raw_ticket_title`). Lean and Quint both accepted the valid UTF-8/UTF-8 chain. - Verified the negative case by corrupting the second hop to `UTF-16 string`; Lean rejected `modelDataFlowSourceBitEncodingsMatchModeledSources` and Quint produced a counterexample for the same invariant. - Current scope is exact modeled-target source references. Field-path source strings such as `TicketCaptured.ticket_title` still need a later schema-level representation change because Quint does not support constructing/comparing derived string coordinates in a way that would make that robust.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Slipstream/emc!24
No description provided.