perf: serialize events once in the append path (#361) #408

Merged
jwilger merged 1 commit from perf/361-single-serialization into main 2026-06-13 07:37:32 -07:00
Owner

Summary

Events were serialized twice on the way to storage: once to a serde_json::Value in StreamWrites::append, then again by each backend when binding that Value for SQL (or embedding it on disk).

This change makes StreamWriteEntry::event_data a pre-serialized Box<serde_json::value::RawValue>, produced once via to_raw_value in append. Backends now consume the already-serialized JSON directly:

  • postgres: binds the raw JSON verbatim into JSONB (no re-encode)
  • sqlite: binds the raw JSON string as TEXT (no to_string)
  • memory: stores the raw JSON string; reads via from_str
  • fs: parses the raw JSON back into a serde_json::Value for its on-disk envelope

This eliminates the intermediate Value allocation and one serialize pass per event across all backends.

fs format safety

The fs backend keeps writing a serde_json::Value into its on-disk envelope, so the envelope and its integrity anchor (ADR-0046) stay byte-identical to the historical representation. This is exact, not approximate: from_str::<Value>(to_raw_value(&e).get()) yields the same sorted-key Value (serde_json::Map is a BTreeMap) as the previous to_value(&e), so the serialized envelope bytes — and thus the anchor — are unchanged. The fs read-time fsck contract tests confirm anchors still validate.

Tests

Behavior is unchanged and verified by the existing backend contract suites, which exercise append + read for all four backends (memory, sqlite, postgres, fs). Full workspace suite passes (242 tests).

Workflow note

Phases 1-3 are not applicable: behavior-preserving internal optimization, no public behavioral change. The cross-backend contract suites are the regression safety net (Phase 4 TDD).

Closes #361

## Summary Events were serialized **twice** on the way to storage: once to a `serde_json::Value` in `StreamWrites::append`, then again by each backend when binding that `Value` for SQL (or embedding it on disk). This change makes `StreamWriteEntry::event_data` a pre-serialized `Box<serde_json::value::RawValue>`, produced once via `to_raw_value` in `append`. Backends now consume the already-serialized JSON directly: - **postgres**: binds the raw JSON verbatim into JSONB (no re-encode) - **sqlite**: binds the raw JSON string as TEXT (no `to_string`) - **memory**: stores the raw JSON string; reads via `from_str` - **fs**: parses the raw JSON back into a `serde_json::Value` for its on-disk envelope This eliminates the intermediate `Value` allocation and one serialize pass per event across all backends. ## fs format safety The fs backend keeps writing a `serde_json::Value` into its on-disk envelope, so the envelope and its integrity anchor (ADR-0046) stay **byte-identical** to the historical representation. This is exact, not approximate: `from_str::<Value>(to_raw_value(&e).get())` yields the same sorted-key `Value` (`serde_json::Map` is a `BTreeMap`) as the previous `to_value(&e)`, so the serialized envelope bytes — and thus the anchor — are unchanged. The fs read-time fsck contract tests confirm anchors still validate. ## Tests Behavior is unchanged and verified by the existing backend contract suites, which exercise append + read for all four backends (memory, sqlite, postgres, fs). Full workspace suite passes (242 tests). ## Workflow note Phases 1-3 are not applicable: behavior-preserving internal optimization, no public behavioral change. The cross-backend contract suites are the regression safety net (Phase 4 TDD). Closes #361
perf: serialize events once in the append path (#361)
All checks were successful
CI / Detect Changes (pull_request) Successful in 4s
CI / Format (pull_request) Successful in 15s
CI / Request auto_review semantic review (pull_request) Successful in 2s
auto_review auto_review: no findings
CI / Clippy (pull_request) Successful in 1m55s
CI / Security Audit (pull_request) Successful in 26s
CI / Test (pull_request) Successful in 2m54s
CI / Mutation (pull_request) Has been skipped
CI / CI Gate (pull_request) Successful in 2s
b85be9b6ca
Events were serialized twice on the way to storage: once to a
`serde_json::Value` in `StreamWrites::append`, then again by each
backend when binding that `Value` for SQL (or embedding it on disk).

Change `StreamWriteEntry::event_data` to a pre-serialized
`Box<serde_json::value::RawValue>` produced once via `to_raw_value` in
`append`. Backends now consume the already-serialized JSON directly:

- postgres: binds the raw JSON verbatim into JSONB (no re-encode)
- sqlite: binds the raw JSON string as TEXT (no `to_string`)
- memory: stores the raw JSON string; reads via `from_str`
- fs: parses the raw JSON back into a `serde_json::Value` for its
  on-disk envelope. This keeps the envelope (and its integrity anchor,
  ADR-0046) byte-identical to the historical representation, because
  `from_str::<Value>(to_raw_value(&e).get())` yields exactly the same
  sorted-key `Value` as the previous `to_value(&e)`.

This eliminates the intermediate `Value` allocation and one serialize
pass per event across all backends.

Behavior is unchanged and verified by the existing backend contract
suites, which exercise append + read for all four backends (memory,
sqlite, postgres, fs); the full workspace test suite passes.

Workflow note: Phases 1-3 are not applicable — behavior-preserving
internal optimization, no public behavioral change. The cross-backend
contract suites are the regression safety net (Phase 4 TDD).

Closes #361
auto-review left a comment

This PR optimizes event serialization by pre-serializing events to JSON once in the append path, reducing redundant serialization across different backends. The change appears safe to merge as it maintains existing behavior and passes all tests.

Walkthrough

  • eventcore-fs/src/lib.rs: Converts pre-serialized JSON back to serde_json::Value to maintain historical on-disk format.
  • eventcore-memory/src/lib.rs: Stores pre-serialized JSON strings directly, avoiding re-serialization.
  • eventcore-postgres/src/lib.rs: Uses Box<RawValue> for event data, binding it directly to SQL.
  • eventcore-sqlite/src/lib.rs: Binds pre-serialized JSON strings directly, eliminating unnecessary serialization.
  • eventcore-types/src/store.rs: Changes StreamWriteEntry::event_data to Box<RawValue>, serialized once at append time.

LLM usage and cost

This PR optimizes event serialization by pre-serializing events to JSON once in the append path, reducing redundant serialization across different backends. The change appears safe to merge as it maintains existing behavior and passes all tests. ## Walkthrough - **eventcore-fs/src/lib.rs**: Converts pre-serialized JSON back to `serde_json::Value` to maintain historical on-disk format. - **eventcore-memory/src/lib.rs**: Stores pre-serialized JSON strings directly, avoiding re-serialization. - **eventcore-postgres/src/lib.rs**: Uses `Box<RawValue>` for event data, binding it directly to SQL. - **eventcore-sqlite/src/lib.rs**: Binds pre-serialized JSON strings directly, eliminating unnecessary serialization. - **eventcore-types/src/store.rs**: Changes `StreamWriteEntry::event_data` to `Box<RawValue>`, serialized once at append time. ## LLM usage and cost - Reasoning (gpt-4o) in=22296 out=16928 cost=$0.365400 - Cheap (gpt-4o-mini) in=689 out=48 cost=$0.000132 Estimated total USD: $0.365532 via https://api.openai.com and https://api.openai.com
jwilger deleted branch perf/361-single-serialization 2026-06-13 07:37:32 -07:00
Sign in to join this conversation.
No description provided.