The Inbox & Outbox Pattern
reliability distributed-systems event-driven
Picture this: your e-commerce service just confirmed an order. It writes to the database and fires off a message to the notification service. Then - crash. Did the message go out? Was the order saved? You genuinely don’t know. This is the dual-write problem, and it quietly corrupts distributed systems every day.
The Transactional Outbox Pattern (and its sibling, the Inbox Pattern) elegantly solves this. No two-phase commit. No Saga complexity for simple cases. Just one atomic write and a reliable relay. Let’s dissect it completely.
The Problem: Dual-Write Catastrophe
In any event-driven microservice architecture, you almost always need to do two things at once:
- Persist state to a database
- Publish an event/message to a message broker (Kafka, RabbitMQ, SQS…)
These are two separate resources. And two separate writes can fail independently. This creates a consistency nightmare.
❌ Without the Pattern
- DB write succeeds, message fails → ghost state
- Message sent, DB fails → phantom events
- Crash mid-flight → unpredictable outcome
- Impossible to replay lost events
- Consumers act on stale or missing data
✓ With the Pattern
- Atomically write state + event together
- Event relay retries until confirmed
- Crashes are safe - relay picks up on restart
- Full event audit trail in DB
- At-least-once delivery guaranteed
Core Insight: You can’t do an atomic write across a database and a message broker. But you can do an atomic write within a single database transaction. The Outbox pattern exploits this.
The Outbox Pattern - How It Works
The key idea: instead of publishing directly to a message broker, you write the event to a special outbox table in the same database transaction as your business data. A separate relay process then reads from this table and publishes to the broker.
- The Order Service writes to
ordersAND inserts a row intooutbox- both in one database transaction. If the transaction fails, neither write persists. - A Relay Process periodically queries the outbox for unsent messages (
where sent = false). - The relay publishes each event to the message broker.
- On successful publish, the relay marks the outbox row as
sent = true(or deletes it).
Why this is atomic: Steps 1–2 in the business logic touch only one resource (the database), so the transaction semantics of your DB give you atomicity for free.
The Outbox Table Schema
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_type VARCHAR(100) NOT NULL, -- 'ORDER_CONFIRMED'
aggregate_id VARCHAR(100) NOT NULL, -- entity the event relates to
payload JSONB NOT NULL, -- the full event body
created_at TIMESTAMPTZ DEFAULT NOW(),
sent BOOLEAN DEFAULT FALSE, -- has relay published it?
sent_at TIMESTAMPTZ
);Two Relay Strategies: Polling vs. CDC
The relay process can read the outbox in two fundamentally different ways:
Polling (Simple Relay)
The relay runs a SELECT * FROM outbox WHERE sent = false LIMIT 100 query on a schedule. Dead simple to implement. Works with any relational database. The downside: there’s a delay between write and publish (equal to the poll interval), and at high throughput, polling can pressure your DB.
Change Data Capture (CDC)
Tools like Debezium tap into the database’s transaction log (PostgreSQL’s WAL, MySQL’s binlog) and stream every change in near real-time. No polling needed. Far more efficient at scale. The tradeoff is operational complexity - you need to run and manage a CDC pipeline.
Recommendation: Start with polling. It’s 20 lines of code and works perfectly for most workloads. Graduate to CDC when you need sub-second latency or are writing millions of events per hour.
The Inbox Pattern - The Consumer Side
The Outbox handles publishing reliably. But what about the consumer? Message brokers with at-least-once delivery can deliver the same message twice. If your consumer isn’t idempotent, you get duplicated side effects - double charges, double emails, double everything.
The Inbox Pattern solves exactly this: it tracks which messages have already been processed, turning at-least-once delivery into effective exactly-once processing.
- Consumer receives a message from the broker.
- It checks the inbox table: have I seen this
msg_idbefore? - If yes - it’s a duplicate. Discard it safely.
- If no - insert into inbox AND perform the business logic in one transaction.
CREATE TABLE inbox (
msg_id VARCHAR(200) PRIMARY KEY, -- broker message ID
received_at TIMESTAMPTZ DEFAULT NOW(),
processed_at TIMESTAMPTZ
);
-- Inside the consumer handler (pseudocode):
BEGIN TRANSACTION;
INSERT INTO inbox (msg_id) VALUES ('abc-125')
ON CONFLICT DO NOTHING;
IF affected_rows = 0 THEN
ROLLBACK; -- duplicate, skip
END IF;
-- do the actual work
INSERT INTO notifications ...;
UPDATE inbox SET processed_at = NOW() WHERE msg_id = 'abc-125';
COMMIT;The Full Picture: Outbox + Inbox Together
When you combine both patterns, you get a fully reliable messaging pipeline: the producer guarantees every event is eventually published (Outbox), and the consumer guarantees every event is processed exactly once (Inbox). This is the gold standard for event-driven microservices.
When to Use This Pattern
Use it when: you need to update your database AND publish an event, and you can’t afford to lose either. Order processing, payment events, inventory updates, user registrations - anything mission-critical.
Skip it when: the operation is truly fire-and-forget, or you’re working within a single service with no external messaging needs. Don’t add complexity you don’t need.
Gotchas & Edge Cases
- Ordering is not guaranteed across aggregate IDs with polling. If strict ordering matters, use a partition key or CDC with a single Kafka partition per entity.
- Outbox table can grow large. Archive or delete rows with
sent = trueafter some retention period. - The relay is a single point of failure unless you run multiple instances with distributed locking (use Postgres advisory locks or optimistic concurrency).
- Inbox dedup only works if
msg_idis stable. Make sure your broker actually preserves message IDs on redelivery.
“Distributed systems don’t fail rarely - they fail constantly, partially, and silently. Design for failure as the default, not the exception.”
Key Takeaways
- The dual-write problem is real and dangerous. Never write to a DB and a broker in separate operations without a safety net.
- The Outbox pattern makes event publishing atomic with your DB write - one transaction, no distributed coordination needed.
- The relay process reads unsent events and publishes them; use polling for simplicity, CDC for scale.
- The Inbox pattern gives consumers idempotency - processing each message exactly once even if delivered multiple times.
- Together, they form a bulletproof messaging backbone for any event-driven architecture.