Idempotency Is the Most Important Concept in Distributed Systems

10 minute read|Published January 2026

Every distributed system retries. Networks drop packets. Services restart. Load balancers reroute. Queues redeliver. If any operation in your system cannot handle being executed twice, you will eventually corrupt data, charge someone twice, send duplicate notifications, or process the same order multiple times.

Idempotency - the property that an operation produces the same result whether executed once or many times - is not a feature. It is a correctness requirement for every system that operates in a world where retries are inevitable.

This is true for payment systems. It is also true for workflow engines, event consumers, data pipelines, API endpoints, and AI agents. Idempotency is the most broadly applicable concept in distributed systems engineering, and most systems get it wrong.

Idempotency Beyond Payments

The payment use case is well understood: use idempotency keys to prevent duplicate charges. But idempotency applies everywhere that retries occur.

Loading diagram...
Idempotency is required at every boundary where retries can happen. Most systems have more retry boundaries than engineers realize.

Event Consumers

Event queues deliver messages at least once. Consumer crashes, network timeouts, and partition rebalancing all cause redelivery. An event consumer that inserts a row on each delivery creates duplicates. An event consumer that increments a counter on each delivery inflates the count.

# Non-idempotent consumer - breaks on redelivery
def handle_order_event(event):
    db.execute("INSERT INTO orders VALUES (%s, %s)",
               event.order_id, event.amount)
    # Duplicate delivery → duplicate row

# Idempotent consumer - safe on redelivery
def handle_order_event(event):
    db.execute("""
        INSERT INTO orders (id, amount)
        VALUES (%s, %s)
        ON CONFLICT (id) DO NOTHING
    """, event.order_id, event.amount)
    # Duplicate delivery → no-op

Workflow Tasks

Workflow engines retry failed tasks. A task that sends an email, calls an API, or updates a database must produce the same outcome on retry. If a task creates a resource, the retry must detect the existing resource instead of creating a duplicate.

class CreateUserTask:
    def execute(self, workflow_context):
        existing = self.user_repo.find_by_email(
            workflow_context["email"]
        )
        if existing:
            return existing  # Idempotent: return existing

        user = self.user_repo.create(
            email=workflow_context["email"],
            name=workflow_context["name"],
        )
        return user

Data Pipeline Stages

Pipeline stages that are retried must produce the same output regardless of how many times they run. The standard approach is to make each stage fully replace its output rather than append to it.

# Non-idempotent: appends on retry
def transform_daily_orders(date):
    orders = extract_orders(date)
    transformed = transform(orders)
    db.execute("INSERT INTO warehouse.orders ...", transformed)
    # Retry → duplicate rows

# Idempotent: replaces partition on retry
def transform_daily_orders(date):
    orders = extract_orders(date)
    transformed = transform(orders)
    db.execute("DELETE FROM warehouse.orders WHERE date = %s", date)
    db.execute("INSERT INTO warehouse.orders ...", transformed)
    # Retry → same result

AI Agent Actions

AI agents execute tool calls based on model outputs. If the agent process crashes after executing a tool but before recording the result, the workflow engine retries the step. The tool call executes again.

A tool that creates a calendar event creates a duplicate. A tool that sends a message sends it twice. A tool that writes to a database writes a duplicate record.

// Idempotent tool execution for AI agents
async function executeToolIdempotent(
  workflowId: string,
  stepId: string,
  tool: Tool,
  args: unknown
) {
  const key = `${workflowId}:${stepId}:${tool.name}`;

  const cached = await idempotencyStore.get(key);
  if (cached) return cached.result;

  const result = await tool.execute(args);

  await idempotencyStore.set(key, { result, timestamp: Date.now() });
  return result;
}

Idempotency is not just for payment systems. Every event consumer, workflow task, pipeline stage, and AI tool call needs idempotency guarantees. The mechanism varies but the principle is universal: an operation executed twice must produce the same result as an operation executed once.

The Three Patterns

Most idempotency implementations use one of three patterns:

Natural idempotency. Some operations are inherently idempotent. Setting a value (PUT) is idempotent. Reading data is idempotent. Deleting a specific record is idempotent. These require no additional infrastructure.

Idempotency keys. The caller generates a unique key for each logical operation. The server checks the key before processing. If the key exists, the server returns the cached result. This is the standard approach for mutating API endpoints.

Conditional writes. The operation includes a precondition that ensures it only executes once. INSERT ... ON CONFLICT DO NOTHING. UPDATE ... WHERE version = N. DELETE ... WHERE status = 'pending'. The database enforces the idempotency constraint.

-- Pattern 1: Natural idempotency (no extra work)
UPDATE users SET email = 'new@example.com' WHERE id = 123;

-- Pattern 2: Idempotency key (explicit dedup)
INSERT INTO idempotency_keys (key, result)
VALUES ('req-abc-123', '{"status": "ok"}')
ON CONFLICT (key) DO NOTHING;

-- Pattern 3: Conditional write (precondition)
UPDATE orders SET status = 'shipped'
WHERE id = 456 AND status = 'processing';
-- Only executes once: second attempt finds
-- status = 'shipped', not 'processing'

The Atomic Transaction Requirement

The most common idempotency bug is a gap between the operation and the dedup record. If the operation completes but the dedup record is not written (due to a crash, network error, or separate transaction), the retry will not find the dedup record and will execute the operation again.

The fix is simple in principle: the operation and the dedup record must be in the same atomic transaction.

# WRONG: separate operations
def process_payment(idempotency_key, amount):
    charge = stripe.charge(amount)        # Step 1
    db.insert_idem_key(idempotency_key)   # Step 2
    # Crash between Step 1 and Step 2
    # → retry charges again

# RIGHT: atomic transaction
def process_payment(idempotency_key, amount):
    with db.transaction() as tx:
        if tx.idem_key_exists(idempotency_key):
            return tx.get_cached_result(idempotency_key)

        charge = stripe.charge(amount)

        tx.insert_idem_key(idempotency_key, charge)
        tx.insert_payment_record(charge)
        # All or nothing. Crash rolls back everything.

In practice this is harder when operations span multiple systems (a database and an external API). The external API call cannot be rolled back. The solution is to make the external call idempotent independently (using the external service's own idempotency key) and record the result atomically with the dedup key.

Why Most Systems Get It Wrong

Idempotency is conceptually simple and operationally difficult. Teams know they need it. They add idempotency keys to their API layer. Then they forget about:

  • Event consumers that process queue messages without dedup
  • Background jobs that run on a cron schedule without checking previous runs
  • Workflow tasks that create resources without checking for existing ones
  • Pipeline stages that append rather than replace
  • Webhook handlers that process notifications without dedup

Each of these is a retry boundary. Each needs idempotency guarantees. The system is only as reliable as its weakest boundary.

Design for idempotency at every layer. Test it by deliberately replaying requests, redelivering events, and rerunning tasks. The result should be the same every time. If it is not, you have a correctness bug that will eventually cost you money, data, or trust.