PostgreSQL Advisory Lock Coordination with Contract Tests (TDD) #239

Closed
opened 2025-12-28 19:57:31 -08:00 by jwilger · 0 comments
jwilger commented 2025-12-28 19:57:31 -08:00 (Migrated from github.com)

Overview

Implement PostgreSQL coordination using advisory locks on dedicated connections per ARCHITECTURE.md. Follows the EventStore/Commanded pattern: subscriptions table tracks checkpoints, advisory locks on dedicated connections provide coordination.

Parent Epic: #237

Core Pattern (per ARCHITECTURE.md)

Subscriptions Table - Checkpoint tracking:

  • Schema: (subscription_name TEXT PRIMARY KEY, last_position BIGINT, updated_at TIMESTAMPTZ)
  • Purpose: WHERE did we process up to?
  • Updated transactionally with projection writes

Advisory Locks on Dedicated Connections - Coordination:

  • Each projector owns a dedicated connection (NOT from pool)
  • Advisory lock acquired via pg_try_advisory_lock(hash(subscription_name)) (non-blocking per ADR-028)
  • Lock held for session duration (connection open)
  • Released automatically when connection closes
  • NO heartbeat table needed
  • NO validity checking needed

Implementation Components

  • Dedicated connection creation (not from pool)
  • Advisory lock acquisition/release (try-acquire, non-blocking)
  • Integration with ProjectionRunner

Key Points (per ARCHITECTURE.md + ADR-028)

  • NO heartbeat table - connection lifetime is the liveness signal
  • NO coordinator trait abstraction - direct PostgreSQL advisory lock calls
  • NO is_valid() checks in projection loop - connection lifecycle handles it
  • NO heartbeat() calls - connection alive = leadership held
  • NO blocking wait - try-acquire returns error if lock held (ADR-028)
  • NO library-level retry - caller/orchestrator handles retry (ADR-028)
  • Connection lifecycle = lock lifecycle = process lifecycle
  • Simpler: One table, one lock, automatic release

Acceptance Criteria

  • First instance acquires leadership via advisory lock on dedicated connection
  • Second instance returns LockNotAcquired error when leadership unavailable (non-blocking per ADR-028)
  • Crash/disconnect releases leadership automatically (session-scoped)
  • Different projectors have independent coordination (different lock keys)
  • Contract tests verify coordination behavior

API Shape (per ADR-028)

pub async fn acquire_leadership(
    conn: &mut PgConnection,
    subscription_name: &str,
) -> Result<LeadershipGuard, CoordinationError>;

pub enum CoordinationError {
    LockNotAcquired { subscription_name: String },
    DatabaseError(sqlx::Error),
}

TDD Approach

  1. Red: Write contract tests for coordination behavior
  2. Green: Implement in eventcore-postgres to pass tests
  3. Refactor: Clean up implementation

References

  • ADR-026: Subscription Table Coordination
  • ADR-028: Advisory Lock Acquisition Behavior (non-blocking try-acquire)

Updated to incorporate ADR-028 decision: non-blocking try-acquire

## Overview Implement PostgreSQL coordination using advisory locks on dedicated connections per ARCHITECTURE.md. Follows the EventStore/Commanded pattern: subscriptions table tracks checkpoints, advisory locks on dedicated connections provide coordination. **Parent Epic**: #237 ## Core Pattern (per ARCHITECTURE.md) **Subscriptions Table** - Checkpoint tracking: - Schema: `(subscription_name TEXT PRIMARY KEY, last_position BIGINT, updated_at TIMESTAMPTZ)` - Purpose: WHERE did we process up to? - Updated transactionally with projection writes **Advisory Locks on Dedicated Connections** - Coordination: - Each projector owns a dedicated connection (NOT from pool) - Advisory lock acquired via `pg_try_advisory_lock(hash(subscription_name))` (non-blocking per ADR-028) - Lock held for session duration (connection open) - Released automatically when connection closes - NO heartbeat table needed - NO validity checking needed ## Implementation Components - Dedicated connection creation (not from pool) - Advisory lock acquisition/release (try-acquire, non-blocking) - Integration with `ProjectionRunner` ## Key Points (per ARCHITECTURE.md + ADR-028) - ✅ NO heartbeat table - connection lifetime is the liveness signal - ✅ NO coordinator trait abstraction - direct PostgreSQL advisory lock calls - ✅ NO `is_valid()` checks in projection loop - connection lifecycle handles it - ✅ NO `heartbeat()` calls - connection alive = leadership held - ✅ NO blocking wait - try-acquire returns error if lock held (ADR-028) - ✅ NO library-level retry - caller/orchestrator handles retry (ADR-028) - ✅ Connection lifecycle = lock lifecycle = process lifecycle - ✅ Simpler: One table, one lock, automatic release ## Acceptance Criteria - [ ] First instance acquires leadership via advisory lock on dedicated connection - [ ] Second instance returns `LockNotAcquired` error when leadership unavailable (non-blocking per ADR-028) - [ ] Crash/disconnect releases leadership automatically (session-scoped) - [ ] Different projectors have independent coordination (different lock keys) - [ ] Contract tests verify coordination behavior ## API Shape (per ADR-028) ```rust pub async fn acquire_leadership( conn: &mut PgConnection, subscription_name: &str, ) -> Result<LeadershipGuard, CoordinationError>; pub enum CoordinationError { LockNotAcquired { subscription_name: String }, DatabaseError(sqlx::Error), } ``` ## TDD Approach 1. **Red**: Write contract tests for coordination behavior 2. **Green**: Implement in eventcore-postgres to pass tests 3. **Refactor**: Clean up implementation ## References - ADR-026: Subscription Table Coordination - ADR-028: Advisory Lock Acquisition Behavior (non-blocking try-acquire) --- *Updated to incorporate ADR-028 decision: non-blocking try-acquire*
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
jwilger/eventcore#239
No description provided.