The Inbox & Outbox Pattern


reliability distributed-systems event-driven
ReliabilityDistributed SystemsEvent-Driven

Picture this: your e-commerce service just confirmed an order. It writes to the database and fires off a message to the notification service. Then - crash. Did the message go out? Was the order saved? You genuinely don’t know. This is the dual-write problem, and it quietly corrupts distributed systems every day.

The Transactional Outbox Pattern (and its sibling, the Inbox Pattern) elegantly solves this. No two-phase commit. No Saga complexity for simple cases. Just one atomic write and a reliable relay. Let’s dissect it completely.


The Problem: Dual-Write Catastrophe

In any event-driven microservice architecture, you almost always need to do two things at once:

  1. Persist state to a database
  2. Publish an event/message to a message broker (Kafka, RabbitMQ, SQS…)

These are two separate resources. And two separate writes can fail independently. This creates a consistency nightmare.

❌ Without the Pattern

  • DB write succeeds, message fails → ghost state
  • Message sent, DB fails → phantom events
  • Crash mid-flight → unpredictable outcome
  • Impossible to replay lost events
  • Consumers act on stale or missing data

✓ With the Pattern

  • Atomically write state + event together
  • Event relay retries until confirmed
  • Crashes are safe - relay picks up on restart
  • Full event audit trail in DB
  • At-least-once delivery guaranteed

Core Insight: You can’t do an atomic write across a database and a message broker. But you can do an atomic write within a single database transaction. The Outbox pattern exploits this.


The Outbox Pattern - How It Works

The key idea: instead of publishing directly to a message broker, you write the event to a special outbox table in the same database transaction as your business data. A separate relay process then reads from this table and publishes to the broker.

Fig 1 - Transactional Outbox: Write Path
Order Service(Producer)ATOMIC TRANSACTIONordersid: 9821 | status: confirmeduser_id: 42 | amount: ₹2,499outboxevent: ORDER_CONFIRMEDpayload: {id:9821} | sent: false1. WriteRelay Process(Polling / CDC)2. PollMessageBrokerKafka / SQS3. Publish4. Mark sent=true
  1. The Order Service writes to orders AND inserts a row into outbox - both in one database transaction. If the transaction fails, neither write persists.
  2. A Relay Process periodically queries the outbox for unsent messages (where sent = false).
  3. The relay publishes each event to the message broker.
  4. On successful publish, the relay marks the outbox row as sent = true (or deletes it).

Why this is atomic: Steps 1–2 in the business logic touch only one resource (the database), so the transaction semantics of your DB give you atomicity for free.

The Outbox Table Schema

CREATE TABLE outbox (
id          UUID          PRIMARY KEY DEFAULT gen_random_uuid(),
event_type  VARCHAR(100)  NOT NULL,           -- 'ORDER_CONFIRMED'
aggregate_id VARCHAR(100) NOT NULL,           -- entity the event relates to
payload     JSONB         NOT NULL,           -- the full event body
created_at  TIMESTAMPTZ   DEFAULT NOW(),
sent        BOOLEAN       DEFAULT FALSE,      -- has relay published it?
sent_at     TIMESTAMPTZ
);

Two Relay Strategies: Polling vs. CDC

The relay process can read the outbox in two fundamentally different ways:

Fig 2 - Relay Strategies: Polling vs. Change Data Capture
Strategy A - PollingDatabaseRelay(SELECT loop)BrokerSELECT WHERE sent=falsepublishevery N secondsStrategy B - CDC (Debezium)DatabaseWAL / binlogDebeziumCDC ConnectorBrokerstreams changespublishnear real-time

Polling (Simple Relay)

The relay runs a SELECT * FROM outbox WHERE sent = false LIMIT 100 query on a schedule. Dead simple to implement. Works with any relational database. The downside: there’s a delay between write and publish (equal to the poll interval), and at high throughput, polling can pressure your DB.

Change Data Capture (CDC)

Tools like Debezium tap into the database’s transaction log (PostgreSQL’s WAL, MySQL’s binlog) and stream every change in near real-time. No polling needed. Far more efficient at scale. The tradeoff is operational complexity - you need to run and manage a CDC pipeline.

Recommendation: Start with polling. It’s 20 lines of code and works perfectly for most workloads. Graduate to CDC when you need sub-second latency or are writing millions of events per hour.


The Inbox Pattern - The Consumer Side

The Outbox handles publishing reliably. But what about the consumer? Message brokers with at-least-once delivery can deliver the same message twice. If your consumer isn’t idempotent, you get duplicated side effects - double charges, double emails, double everything.

The Inbox Pattern solves exactly this: it tracks which messages have already been processed, turning at-least-once delivery into effective exactly-once processing.

Fig 3 - Inbox Pattern: Idempotent Consumer
Brokerdelivers msg(maybe 2x)NotificationService(Consumer)check inboxConsumer DBinboxmsg_id: abc-123 | processed: ✓msg_id: abc-124 | processed: ✓msg_id: abc-125 | processed: ✗notificationsemail sent for order 9821email sent for order 9820alreadyprocessed?YESDISCARDNOProcess &Mark Done
  1. Consumer receives a message from the broker.
  2. It checks the inbox table: have I seen this msg_id before?
  3. If yes - it’s a duplicate. Discard it safely.
  4. If no - insert into inbox AND perform the business logic in one transaction.
CREATE TABLE inbox (
msg_id       VARCHAR(200)  PRIMARY KEY,  -- broker message ID
received_at  TIMESTAMPTZ   DEFAULT NOW(),
processed_at TIMESTAMPTZ
);

-- Inside the consumer handler (pseudocode):
BEGIN TRANSACTION;
INSERT INTO inbox (msg_id) VALUES ('abc-125')
  ON CONFLICT DO NOTHING;
IF affected_rows = 0 THEN
  ROLLBACK; -- duplicate, skip
END IF;
-- do the actual work
INSERT INTO notifications ...;
UPDATE inbox SET processed_at = NOW() WHERE msg_id = 'abc-125';
COMMIT;

The Full Picture: Outbox + Inbox Together

Fig 4 - End-to-End: Producer with Outbox + Consumer with Inbox
PRODUCER DBordersbusiness dataoutboxpending eventsRelaypolls outboxMessageBrokerConsumerserviceCONSUMER DBinboxdedup lognotificationsProducer SideTransportConsumer Side

When you combine both patterns, you get a fully reliable messaging pipeline: the producer guarantees every event is eventually published (Outbox), and the consumer guarantees every event is processed exactly once (Inbox). This is the gold standard for event-driven microservices.


When to Use This Pattern

Use it when: you need to update your database AND publish an event, and you can’t afford to lose either. Order processing, payment events, inventory updates, user registrations - anything mission-critical.

Skip it when: the operation is truly fire-and-forget, or you’re working within a single service with no external messaging needs. Don’t add complexity you don’t need.

Gotchas & Edge Cases

  1. Ordering is not guaranteed across aggregate IDs with polling. If strict ordering matters, use a partition key or CDC with a single Kafka partition per entity.
  2. Outbox table can grow large. Archive or delete rows with sent = true after some retention period.
  3. The relay is a single point of failure unless you run multiple instances with distributed locking (use Postgres advisory locks or optimistic concurrency).
  4. Inbox dedup only works if msg_id is stable. Make sure your broker actually preserves message IDs on redelivery.

“Distributed systems don’t fail rarely - they fail constantly, partially, and silently. Design for failure as the default, not the exception.”

Key Takeaways

  1. The dual-write problem is real and dangerous. Never write to a DB and a broker in separate operations without a safety net.
  2. The Outbox pattern makes event publishing atomic with your DB write - one transaction, no distributed coordination needed.
  3. The relay process reads unsent events and publishes them; use polling for simplicity, CDC for scale.
  4. The Inbox pattern gives consumers idempotency - processing each message exactly once even if delivered multiple times.
  5. Together, they form a bulletproof messaging backbone for any event-driven architecture.