Invariants for Payment Systems
Most payment-system bugs I have reviewed correspond to a violated invariant that can be named precisely. Duplicate charges are I2 (at-most-once). Ledger drift is I1 (conservation of money). "We marked it captured but the processor didn't" is I4 (processor is the source of truth) plus a missing reconciliation. Webhook handlers running away are I8 layer 4. A cross-tenant data leak is I14. An anonymous state transition is I16.
Naming the invariant is the part that lets you pick a structural fix instead of a workaround. An hourly script that detects duplicate charges is not a fix for I2; a unique constraint is. The discipline is to enumerate the invariants, then derive the implementation from them, rather than the other way around.
This article walks one invariant in depth, summarizes the other eighteen, and links to the Claude Code skill I packaged the methodology as. The set started at twelve and grew to nineteen as I reviewed more systems; that growth is the honest version of the methodology, not a sign of trouble. Naming what was missing is the same move as naming any other invariant.
Why Invariants Before Components
Most practitioner writeups treat payment systems as a collection of components: an authorization service, a capture service, a ledger, a reconciliation job, a webhook handler. This decomposition is useful for discussing deployment boundaries but is a poor starting point for design. Components composed without reference to system-wide invariants produce local correctness within each component and global incorrectness across them.
A capture service that retries safely in isolation can still produce duplicate charges if its idempotency boundary is not aligned with the client's retry boundary. A ledger that enforces referential integrity can still violate conservation of money if an upstream service mutates a balance column directly. A webhook handler that deduplicates correctly can still emit confirmation emails twice if the dedup boundary is the handler's database row instead of the logical event.
In each case, every component is doing its job. The system is wrong anyway.
The fix is to enumerate invariants first. An invariant, in this article, is a property of system state that must hold across arbitrary sequences of operations, failures, retries, and external partial responses. They are framework-independent: a Django payment service and a Go payment service are subject to the same twelve. They are also enforcement-mode-independent: an invariant can be enforced structurally (a unique constraint, an append-only table) or operationally (a reconciliation job, a runbook). What matters is that something enforces it, and that the team can name what.
This is not a new approach. Helland's writing on idempotence and distributed transactions is structured this way. Kleppmann's framing of consistency and isolation is essentially a vocabulary of invariants. The contribution of the methodology I'm describing is the specialization to payment systems, where the structural fact that distinguishes the domain (external processor mediation of settlement) imposes invariants that purely internal distributed systems do not face.
Safety and Liveness
Each invariant is tagged [Safety] or [Liveness]; a few span both. The distinction is operational, not academic.
Safety invariants must hold at every committed state. A violation is a correctness bug, data loss, or money loss. They are unit-testable and either pass or fail at the database after a transaction commits. I1 (conservation), I2 (at-most-once), I9 (no floats), I13 (HMAC verification), I14 (tenant isolation) are all Safety.
Liveness invariants must hold eventually within a stated bound. A violation is degraded service, recoverable. They are tested with chaos drills and monitored as SLOs. I4 (processor source of truth via reconciliation) and I17 (bounded retries) are both Liveness, and both depend on the bound being part of the invariant. "Eventual" without a bound is a hope, not a guarantee.
This taxonomy is the most useful thing the framework gives you in a code review. A reviewer who can say "this PR's tenant filter is a Safety property and needs a structural enforcement, not a runtime check" has compressed half a paragraph of distributed-systems vocabulary into one word.
One Invariant in Depth: At-Most-Once Terminal Effect
I2 [Safety]: Given a request identity (actor_id, idempotency_key), arbitrarily many retries must produce at most one successful terminal effect.
This is the defining correctness property of charge operations. A duplicate charge is worse than a failed charge. A failed charge can be retried; a duplicate charge requires a refund, a customer-support ticket, and an apology.
The most common implementation, and the most common bug, is the check-then-act pattern:
if not Payment.objects.filter(idempotency_key=key).exists():
Payment.objects.create(idempotency_key=key, ...)
Two concurrent requests both observe no existing row, both proceed to create, and two rows are inserted. The transactional boundary does not save you: at standard isolation levels, two separate transactions will both see the absence of the row and both insert. The bug surfaces under load, looks like a flaky test in development, and produces real duplicate charges in production.
The structural fix is a database unique constraint plus a caught IntegrityError:
from django.db import transaction, IntegrityError
def create_or_get_intent(merchant_id, key, amount_cents, currency, token):
try:
with transaction.atomic():
return PaymentIntent.objects.create(
merchant_id=merchant_id,
idempotency_key=key,
amount_cents=amount_cents,
currency=currency,
payment_method_token=token,
)
except IntegrityError:
return PaymentIntent.objects.get(
merchant_id=merchant_id,
idempotency_key=key,
)
The unique constraint is the part that races safely. Application-level checks, no matter how careful, cannot. This is the only race-safe pattern, and it is the answer to the interview question "how do you guarantee a payment is processed exactly once."
The honest answer to that interview question is that you don't. Exactly-once delivery is provably unattainable in an asynchronous system with failures. What you ship is effectively-once semantics, built from at-least-once delivery composed with idempotent consumers. The unique constraint is one layer of that defense.
I2 actually requires four layers of independent enforcement, not one:
- Client. Generates a UUID idempotency key per logical request and sends it on every retry.
- Server database. Enforces uniqueness via a constraint on
(merchant_id, idempotency_key)and replays the original response on duplicate. - Processor call. Passes the same idempotency key through to the upstream processor, so retries from a worker are also safe at that boundary.
- Webhook handler. Deduplicates inbound events on
(provider, event_id)with its own unique constraint.
Each layer is independent. A failure in one does not compromise the others. This is the layered defense Pat Helland advocates in his treatment of idempotence. It is also why "we have an idempotency key" is not a sufficient answer when reviewing a payment system: which layer? How is it enforced?
Idempotency is not a single mechanism. It is a property enforced at every layer of the request path. A payment system with an idempotency key but no database-level uniqueness is one bad merge away from charging the card twice.
The Other Eighteen, in One Line Each
The full statements live in the skill repo's invariant reference. The one-line summaries:
- I1 Conservation of money.
[Safety]Every journal posting is balanced; materialized balances are derived from the append-only journal. - I3 Monotonic, append-only audit.
[Safety]State transitions are new rows in an immutable log. Hybrid (status as cache, events as truth on disagreement) is the practical default. - I4 Processor is source of truth; mirror converges via bounded reconciliation.
[Liveness]"Eventual" needs a stated bound: detect within ≤60 min, page after 4 hours, page on >1% drift. - I5 Single-writer per aggregate; eventual consistency across aggregates.
[Safety + Liveness]Per-aggregate viaSELECT FOR UPDATE(serializability via 2PL, not linearizability). - I6 Effects dispatched via a durable queue, not from inside a DB transaction.
[Safety]Strict outbox or pragmatic on-commit + reconciliation backstop. - I7 Intent persisted before effect; recovery completes pending intents.
[Safety + Liveness]Commit apendingrow before any external call; bound the recovery window (e.g. 5 min). - I8 Idempotency at every layer.
[Safety]Independent enforcement at client, server DB, processor call, webhook handler. Domain uniqueness is a separate invariant. - I9 Money is never floating point.
[Safety]Integer minor units or fixed-precision decimal. Currency code travels with every amount. - I10 Every state transition is reversible or explicitly terminal; multi-step workflows are explicit sagas.
[Safety]No quiet one-way doors. - I11 Time is untrusted across systems.
[Safety]Cross-system ordering uses sequence numbers and processor refs, not wall-clock timestamps. - I12 PCI scope declared and enforced at the boundary.
[Safety]PANs and CVVs never enter application code. Know whether you're filing SAQ A or SAQ A-EP. - I13 Inbound webhooks cryptographically authenticated before persist.
[Safety]HMAC + timestamp window + replay protection. Without this, I4 has a hole. - I14 Tenant isolation.
[Safety]Every read and write scoped to a tenant. Cross-tenant existence returns 404, not 403. - I15 OFAC / sanctions screening before money moves.
[Safety]Counterparty cleared as a precondition, not a post-hoc check. - I16 Every state transition has an authenticated, recorded principal.
[Safety]actor_type+actor_id+ role-at-time-of-action on every mutating record. - I17 Bounded retries with explicit budgets.
[Liveness]Max attempts, backoff schedule, dead-letter destination. "Retry forever" turns minutes into hours. - I18 Velocity and per-period limits.
[Safety]Per-account, per-day, per-counterparty checks at the application boundary. - I19 Cross-aggregate causality.
[Safety]Refund references a settled payment, return references a sent ACH, chargeback references a captured charge. FK + state precondition.
Each of these has a structural fix and at least one canonical anti-pattern. The anti-pattern reference catalogs the worst offenders with code.
I Packaged This as a Claude Code Skill
The methodology is structured enough to encode as automation. I shipped it as payment-invariants, a Claude Code skill that triggers on payment-adjacent code review.
Install:
git clone https://github.com/xtilyn/payment-invariants.git ~/.claude/skills/payment-invariants
The skill auto-fires on diffs that touch payment, billing, charge, refund, settlement, reconciliation, or webhook code. It runs in three modes depending on what you're doing:
- Design review. Enumerate which of the twelve invariants apply before discussing components. Each must have a concrete enforcement mechanism.
- PR review. Walk a diff and flag invariant violations by number, citing the matching anti-pattern, recommending the structural fix.
- Incident debugging. Identify which invariant a failure mode violates. The bug usually telegraphs the invariant: duplicate charges → I2; ledger drift → I1; "we marked it captured but the processor didn't" → I4 plus reconciliation.
A worked example: drop the check-then-act if not exists / create pattern into a Django model and ask Claude to review the diff. With the skill installed, the response cites I2 by number, names the race condition, and produces the unique-constraint fix. Without the skill, you get generic feedback about "potential race conditions" that may or may not surface the specific invariant.
The skill body and references are the working set I review payment systems against. Putting it under version control means the methodology can evolve as I find new failure modes, and other engineers can fork and adapt it for their stack's specifics.
Why This Approach Works
Invariant-driven design produces systems that are easier to reason about and more robust under load, but the bigger payoff is communication. Invariants are framework-independent. A team that says "this PR violates I7 because the intent row isn't durably committed before the Stripe call" has compressed a paragraph of distributed-systems vocabulary into a number. Two engineers who have agreed on the twelve can review code together without re-litigating the underlying argument every time.
The methodology has prescriptive use, for new system design: enumerate invariants, derive implementation, validate that each invariant has at least one structural or operational enforcement mechanism. It also has diagnostic use, for reviewing existing systems: for each design choice you observe, ask which invariant it enforces and whether the enforcement is structural or merely disciplinary. The Saleor payment module is a useful worked example here. Its modern design encodes most of the invariants structurally, with the rest backstopped by reconciliation. Naming the gaps lets you reason about the specific failure modes each gap admits, rather than just trusting the system or just distrusting it.
The limits are worth naming. Invariant enumeration does not produce an implementation by itself; mechanisms still have to be chosen, and the choice depends on operational context. The methodology does not address fraud detection, regulatory compliance, tax handling, or the organizational dimensions of running a payments team. These are essential to a real platform; they are out of scope here. What this gets you is the correctness skeleton. The rest of the system hangs off that.
The best payment systems I have reviewed are not the most cleverly designed.
They are the ones where every invariant has a name.
References
- Idempotence Is Not a Medical Condition· Pat Helland, ACM Queue 2012
- Life Beyond Distributed Transactions: An Apostate's Opinion· Pat Helland, CIDR 2007
- Designing Data-Intensive Applications· Martin Kleppmann, O'Reilly 2017
- Pattern: Transactional Outbox· Chris Richardson
- Sagas· Garcia-Molina and Salem, SIGMOD 1987
- payment-invariants (Claude Code skill)· Companion repo
Next
Why Optimal Scheduling Breaks User Trust →Why algorithmically optimal schedulers destroy user trust, and the engineering patterns that prevent schedule churn.