The Microservice That Knew Too Much: Breaking Circular Dependencies

microservices dependency-management service-mesh coupling distributed-systems

The Distributed Monolith

You did everything right.

You read the books. You watched the conference talks. You convinced your team to migrate from the monolith to microservices. Twelve separate services, each owning its own data, each deployable independently. Domain-driven design. Clean contracts. Async communication where it made sense.

Six months later, your architecture looked like this:

UserService ←→ OrderService ←→ InventoryService
     ↕                ↕                ↕
NotificationService ←→ PaymentService
     ↕
OrderService (wait, already listed above)

Service B calls Service A. Service A calls Service C. Service C calls Service B. When you tried to deploy OrderService last Thursday, you had to also deploy PaymentService and UserService because they were all entangled in each other’s startup sequence. When UserService went down at 2 PM, it took OrderService down with it, which cascaded to NotificationService.

You had not built microservices. You had built a distributed monolith — all the complexity of distributed systems, none of the independence.

How Circular Dependencies Happen

Circular dependencies in microservices are almost never intentional. They emerge gradually, one reasonable-seeming shortcut at a time.

Here is a typical sequence:

Week 1: OrderService needs to know if a user exists before creating an order. Simple call to UserService.

Week 4: UserService wants to show order history on the profile page. Simple call to OrderService.

Week 8: OrderService needs to send a notification when an order ships. Simple call to NotificationService.

Week 12: NotificationService needs the user’s email address. Simple call to UserService. But UserService now calls OrderService…

UserService ──calls──▶ OrderService ──calls──▶ NotificationService
     ▲                                                │
     └────────────────────calls───────────────────────┘

This is a cycle. Deploy any one of them, and you must coordinate
with the others. One fails, they all potentially fail.

Each individual decision was defensible. The cumulative effect is an architecture that cannot be deployed or operated independently.

The Failure Chain

Here is what actually happens at runtime when you have a circular dependency:

User clicks "View Profile"
  → UserService.getProfile(userId)
    → calls OrderService.getRecentOrders(userId)   [synchronous HTTP]
      → calls PaymentService.getPaymentStatus(orderId)  [synchronous HTTP]
        → calls UserService.getPaymentMethod(userId)    [synchronous HTTP]
          → UserService is now waiting on itself
          → Timeout after 5 seconds
        → PaymentService returns 503
      → OrderService returns partial data or 503
    → UserService cannot build profile response
  → User sees error or spinner forever

Worse, every service in this chain is holding a thread open, waiting for the next downstream service to respond. Under load, those held threads exhaust the thread pool, and the service becomes unresponsive to all requests — not just the ones in the cycle.

This is what engineers mean when they say a service “brought down everything else.” The circular call graph means one slow node freezes the entire ring.

Diagnosing the Dependency Graph

Before you can fix circular dependencies, you need to see them. Most teams discover them during an incident rather than proactively.

The tools that make the graph visible:

Service mesh (Istio / Linkerd): Captures every HTTP call between services with source and destination. Plot this as a graph and cycles become immediately obvious.

Distributed tracing (Jaeger / Datadog): Every cross-service call gets recorded. Look for traces where Service A appears both near the top and near the bottom of the same trace — that is your cycle.

Manual dependency audit:

For each service, ask:
  1. Which services does it call synchronously?
  2. Which services does it subscribe to for events?
  3. Which services does it query for data at startup?

If the answer to (1) for Service B includes Service A,
and the answer to (1) for Service A includes Service B,
you have a cycle.

Fix 1: Establish a Dependency Direction and Enforce It

The first rule of microservice architecture: dependencies must flow in one direction.

Define a hierarchy. Higher-level services may call lower-level services. Lower-level services may never call higher-level services.

Level 0 (foundational, no upstream dependencies):
  UserService, ProductService, InventoryService

Level 1 (calls Level 0):
  OrderService, CartService

Level 2 (calls Level 0 and Level 1):
  PaymentService, FulfillmentService

Level 3 (calls anything below it):
  NotificationService, AnalyticsService, BillingService

If a Level 0 service (UserService) needs data from Level 1 (OrderService), that is a smell. It means the abstraction is wrong, not that the rule should be broken.

Allowed:
  OrderService ──▶ UserService   (Level 1 calls Level 0)
  PaymentService ──▶ OrderService (Level 2 calls Level 1)

Forbidden:
  UserService ──▶ OrderService   (Level 0 calling Level 1 = cycle risk)

Fix 2: Replace Synchronous Calls with Events

The most powerful fix for circular dependencies is to stop making synchronous calls for information that does not need to be real-time.

Instead of UserService calling OrderService to get order history, flip the model: OrderService emits an event whenever an order is created or updated, and UserService (or a read model) maintains its own copy.

Before (synchronous, creates coupling):

GET /user/{id}/profile
  → UserService calls OrderService synchronously
  → Coupled, fragile, creates cycle risk

After (event-driven, decoupled):

OrderService emits: OrderCreated { orderId, userId, total, status }
                ↓
            Kafka / RabbitMQ Topic: "orders"
                ↓
UserReadModelService subscribes, maintains its own
table of orders per user.

GET /user/{id}/profile
  → UserService queries its own read model
  → No synchronous dependency on OrderService
  → OrderService can be down; profile still loads

This pattern is called event-driven architecture with a read model (or CQRS: Command Query Responsibility Segregation). The tradeoff is eventual consistency — the user profile might show order data that is seconds old. For most use cases, this is perfectly acceptable.

┌──────────────────────────────────────────────────────────────────────┐
│                         Event Bus (Kafka)                             │
│  topics: orders, payments, inventory, users, notifications            │
└────┬─────────────────────┬──────────────────────────┬────────────────┘
     │                     │                          │
     │ publishes           │ subscribes               │ subscribes
     ▼                     ▼                          ▼
┌──────────────┐   ┌────────────────┐      ┌─────────────────────┐
│ OrderService │   │  UserService   │      │  NotificationService │
│              │   │  (read model)  │      │  (email, push)       │
│ owns orders  │   │  owns user     │      │  listens for events  │
│ DB           │   │  + order cache │      │  sends async         │
└──────────────┘   └────────────────┘      └─────────────────────┘

Now none of these services call each other. They communicate exclusively through the event bus. You can deploy any one of them without coordinating with the others.

Fix 3: Use an API Gateway for Cross-Cutting Concerns

A frequent source of circular dependencies is shared logic that ends up in the wrong service. Authentication, rate limiting, user context — these belong at the edge, not scattered across services.

An API Gateway (Kong, AWS API Gateway, custom) handles:

Authentication and JWT validation
Rate limiting
Request routing
User context injection (user ID, permissions) into downstream request headers

Services no longer need to call UserService to validate a token. The gateway validates it and stamps the request.

Client request
  │
  ▼
API Gateway
  ├── Validates JWT (Auth0 / internal)
  ├── Extracts user_id, roles
  ├── Injects headers: X-User-Id, X-User-Roles
  ├── Rate limits by user/IP
  └── Routes to correct service
        │
        ▼
  OrderService receives request
  with X-User-Id already present.
  No call to UserService needed.

This eliminates an entire category of cross-service calls that were creating coupling.

Fix 4: Identify Bounded Contexts (Domain-Driven Design)

Many circular dependencies exist because two services are actually one domain split incorrectly.

If OrderService and PaymentService are always deployed together, always fail together, and always need each other’s data synchronously, they might be one service with a misleadingly clean-looking API boundary.

A bounded context is a cohesive unit of business logic that should be owned by one service:

Bounded Context: Order Fulfillment
  Includes: orders, payments, shipments
  Owns: order lifecycle from creation to delivery
  External interface: REST API + events published to bus

Bounded Context: User Identity
  Includes: accounts, authentication, profiles, preferences
  Owns: who the user is, not what they bought
  External interface: REST API for identity only

Bounded Context: Inventory
  Includes: products, stock levels, reservations
  Owns: what is available to sell
  External interface: REST API + events for stock changes

When you split along bounded contexts, the circular calls often disappear because the data that was being fetched across service boundaries was data that should have lived in the same service all along.

The Architecture After the Refactor

Here is what the system looked like after three months of incremental refactoring away from circular dependencies:

┌─────────────────────────────────────────────────────────────────────────┐
│                           API Gateway                                    │
│        Auth validation, rate limiting, routing, user context injection   │
└──────────┬──────────────────┬───────────────────────┬───────────────────┘
           │                  │                       │
           ▼                  ▼                       ▼
┌──────────────────┐ ┌────────────────┐   ┌──────────────────────────────┐
│  UserService     │ │  OrderService  │   │  InventoryService             │
│  Bounded Context │ │  Bounded       │   │  Bounded Context              │
│  (identity only) │ │  Context       │   │  (stock, products)            │
└──────────────────┘ └───────┬────────┘   └──────────────────────────────┘
                             │
                             │ publishes events
                             ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                        Kafka Event Bus                                   │
│   Topics: order.created, order.shipped, payment.succeeded, stock.low    │
└──────────┬────────────────────────────────────────────────────────────┬─┘
           │                                                            │
           ▼                                                            ▼
┌────────────────────────┐                              ┌───────────────────────┐
│  NotificationService   │                              │  AnalyticsService     │
│  Subscribes to events  │                              │  Subscribes, stores   │
│  Sends emails / push   │                              │  aggregates, reports  │
└────────────────────────┘                              └───────────────────────┘

Results:

Metric	Before	After
Cross-service synchronous calls	47 unique pairs	11 unique pairs
Circular dependency cycles	4	0
Deploy coordination required	All 12 services	Per bounded context (3)
Cascade failures per quarter	8	1
Mean time to deploy	4.5 hours	22 minutes
Service availability (p99)	97.1%	99.8%

Key Takeaways

A distributed monolith is worse than a monolith. You get all the operational complexity of distributed systems without the deployment independence. Avoid it.
Circular dependencies emerge gradually. Each individual call seems reasonable. Audit the full dependency graph regularly, not just during incidents.
Dependencies must flow in one direction. Define a hierarchy and enforce it. If a lower-level service needs to call a higher-level one, the abstraction is wrong.
Events replace most synchronous calls. If you do not need real-time consistency, emit an event and let consumers maintain their own read models.
Bounded contexts are the unit of service design. Services that always need each other’s data are probably one service pretending to be two.

FAQ

Q: How is eventual consistency acceptable for something like user profiles? For most reads, showing data that is one to two seconds old is indistinguishable from real-time for the user. The tradeoff is worth it for the operational independence it provides. For writes that must be immediately visible (password changes, payment methods), read from the source of truth directly.

Q: How do I incrementally refactor out of circular dependencies without a rewrite? Start by identifying the cycle. Then find one call in the cycle that can be replaced with an event. Introduce a small read model service to consume that event. Once the synchronous call is removed, the cycle is broken. Repeat for the next cycle. Never try to fix all of them at once.

Q: What if two services genuinely need each other’s data in real time? First, question whether they are truly separate services or one bounded context that should be merged. If they truly are separate, use a saga pattern or choreography: Service A completes its part and emits an event, Service B responds to that event and emits its own. Neither calls the other directly.

Q: Is an API gateway always necessary? Not always. At small scale (2 to 3 services), direct service-to-service calls are fine. As you add more services and cross-cutting concerns, an API gateway centralizes concerns that would otherwise be duplicated in every service’s middleware stack.

Interview Questions

“What is a circular dependency in microservices, and why is it a problem?” A circular dependency occurs when Service A calls Service B which calls Service A, directly or through intermediaries. It creates tight coupling, prevents independent deployment, and means one service failure can cascade to all services in the cycle.

“How would you detect circular dependencies in a live system?” Use distributed tracing to identify calls where the same service appears at both the start and inside a trace. Service mesh tools like Istio expose a full call graph. A manual audit of each service’s outbound HTTP calls is also effective for smaller systems.

“What is the difference between orchestration and choreography?” Orchestration uses a central coordinator service that tells each service what to do and when. Choreography uses events — each service reacts to events from others without a central coordinator. Choreography reduces coupling but makes the overall flow harder to observe without tracing.

“How do you use CQRS to break service coupling?” CQRS (Command Query Responsibility Segregation) separates write operations (commands) from read operations (queries). Services publish events on writes. Consumers maintain their own read-optimized projections. This eliminates the need for synchronous cross-service reads.

“When should you merge two microservices back into one?” When they always fail together, always deploy together, and always need each other’s data synchronously. These are signals that they represent one bounded context split incorrectly. The overhead of coordinating two separate services is not worth it if they cannot operate independently.