← Back to Overview

Cost Tracker & Replay Lag Control

The cost tracker uses attester attestations to measure how far behind the cluster is in replay. When the median replay position drifts, we dynamically tighten the replay compute budget so the cluster can catch up. When replay is healthy, we restore the full budget.

Reference
SIMD-0322: Add Serial Execution Replay Constraint

SIMD-0322 defines the serial execution cost tracker (DTAA), introduces a max serial execution CU limit, and requires deterministic, post-execution accounting. We build on that framework to gate replay throughput based on real-time lag.

SIMD-0322 Foundations

SIMD-0322 proposes a deterministic transaction assignment algorithm (DTAA) that computes the serial execution makespan by tracking virtual execution tracks. The proposal adds a max serial execution CU limit (25M CUs in the PR) to bound worst-case replay time, independent of total block CUs.

  • Cost tracking uses actual CUs consumed post-execution, not requested limits.
  • All validators must compute identical makespan results to avoid equivocation.
  • Vote transactions are tracked but excluded from the serial execution limit.

Attester Attestation Extension

Each attester attestation includes the most recent bank hash that the attester has replayed up to. We treat this as a signed replay watermark that can be aggregated across the attester set.

replayed_bank_hash: Hash
replayed_bank_hash_slot: u64

A bank hash is considered valid when it matches the local view of the finalized fork at the given slot.

Median Replay Lag Signal

We compute the median of the valid replayed bank hash slots to estimate how far behind the cluster is. The median is robust to outliers and provides a stable signal even when some attesters are slow or faulty.

median_replay_slot = median(valid_attestation.replayed_bank_hash_slot)
replay_lag_slots = current_slot - median_replay_slot

Replay Budget Control Loop

  1. Aggregate attester attestations and filter to valid bank hashes.
  2. Compute the median replay slot and derive replay lag in slots.
  3. If replay lag exceeds a high-water threshold, reduce the serial execution CU limit for replay.
  4. If replay lag stays below a low-water threshold for a sustained window, restore full limits.
  5. Apply hysteresis to avoid oscillating between thresholds.

The control loop directly tunes the serial execution limit introduced by SIMD-0322 (or an equivalent replay budget) to keep replay within the slot time budget on reference hardware.

Implementation Scope

  • Add replay watermark fields to attester attestation signing and verification.
  • Define validation rules for replayed bank hashes against the local finalized fork.
  • Compute median replay slot in the leader/validator aggregation path.
  • Expose replay lag metrics for monitoring and operator visibility.
  • Implement a hysteresis-based CU budget controller tied to SIMD-0322 limits.
  • Simulate adversarial or skewed attester sets to confirm median robustness.

Open Questions

  • What slot lag thresholds best match real-world replay headroom?
  • Should the controller scale the serial execution limit linearly or step-wise?
  • How do we treat attesters that attest to non-finalized forks or stale bank hashes?