Back to Hub
System Design (HLD)

Microservices, Load Balancing & Caching

#microservices#load-balancer#caching#CDN#horizontal-scaling

Microservices, Load Balancing & Caching

Monolith vs Microservices

AspectMonolithMicroservices
DeploymentAll-or-nothingIndependent per service
ScalingEntire applicationPer-service (scale what's hot)
Tech stackUniformPolyglot (best tool per service)
Data consistencyACID transactionsEventual consistency (sagas)
ComplexitySimple initiallyDistributed systems complexity
Team structureSingle teamConway's Law — team per service

When to Choose

  • Start with a monolith — Split when you hit scaling/team boundaries
  • Microservices for: Large teams, independent scaling needs, different reliability requirements per component
  • Avoid microservices if: Small team, simple domain, don't need independent deployment

Load Balancing

Distributes incoming traffic across multiple servers to ensure no single server becomes a bottleneck.

Load Balancing Strategies

AlgorithmDescriptionBest For
Round RobinRotate through servers sequentiallyUniform servers, stateless
Weighted Round RobinServers get traffic proportional to weightHeterogeneous servers
Least ConnectionsRoute to server with fewest active connectionsVariable request duration
IP HashHash client IP → consistent serverSession affinity
Consistent HashingMinimal redistribution when servers changeDistributed caches
RandomPick a random serverSimple, surprisingly effective

L4 vs L7 Load Balancing

LayerOperates OnProsCons
L4 (Transport)TCP/UDP packetsFast, protocol-agnosticCan't route by content
L7 (Application)HTTP headers, URLs, cookiesContent-based routing, SSL terminationSlower, more resource-intensive

Health Checks

Active Health Check:
  LB periodically pings /health endpoint
  → Remove unhealthy servers from pool
  → Re-add when they recover

Passive Health Check:
  LB monitors response codes/latency
  → Mark server as unhealthy after N failures

Caching

Store frequently accessed data closer to the consumer to reduce latency and database load.

Cache Levels

Client (Browser Cache) → CDN → API Gateway Cache → Application Cache → Database Cache
       ~0ms              ~5ms      ~10ms              ~1ms              ~0.1ms
                                                    (Redis/Memcached)   (query cache)

Caching Strategies

StrategyReadWriteConsistencyUse Case
Cache-AsideApp checks cache → miss → read DB → populate cacheApp writes to DB, invalidates cacheEventually consistentMost common, general purpose
Read-ThroughCache handles DB reads transparentlyN/AConsistent readsORM-level caching
Write-ThroughN/AWrite to cache AND DB synchronouslyStrong consistencyWhen consistency > latency
Write-BehindN/AWrite to cache, async flush to DBEventually consistentHigh write throughput
Write-AroundCache-aside readsWrite directly to DB, bypass cacheCache misses on recent writesWrite-heavy, infrequent reads

Cache-Aside Pattern (Most Common)

typescript
class UserService { private cache: RedisClient; private db: Database; async getUser(id: string): Promise<User> { // 1. Check cache const cached = await this.cache.get(`user:${id}`); if (cached) return JSON.parse(cached); // 2. Cache miss → read from DB const user = await this.db.query("SELECT * FROM users WHERE id = $1", [id]); // 3. Populate cache with TTL await this.cache.set(`user:${id}`, JSON.stringify(user), "EX", 3600); return user; } async updateUser(id: string, data: Partial<User>): Promise<void> { // Write to DB await this.db.query("UPDATE users SET ... WHERE id = $1", [id]); // Invalidate cache (don't update — avoids race conditions) await this.cache.del(`user:${id}`); } }

Cache Eviction Policies

PolicyDescriptionWhen to Use
LRUEvict least recently usedGeneral purpose (most common)
LFUEvict least frequently usedStable access patterns
FIFOEvict oldest entrySimple, temporal data
TTLExpire after time durationSession data, tokens
RandomEvict random entryWhen no clear pattern

Cache Problems

ProblemDescriptionSolution
Cache StampedeMany requests miss cache simultaneously → DB overloadLocking (single flight), pre-warming
Cache PenetrationQueries for non-existent keys always missBloom filter, cache null results
Cache AvalancheMany keys expire simultaneouslyStaggered TTLs, circuit breaker
Stale DataCache contains outdated dataTTL, event-driven invalidation

CDN (Content Delivery Network)

Caches static content at edge locations geographically close to users.

How It Works

User in Tokyo → CDN Edge (Tokyo) → [Cache HIT: serve directly, ~5ms]
                                  → [Cache MISS: fetch from origin in US, cache, ~200ms]

What to Cache on CDN

  • Static assets (JS, CSS, images, fonts)
  • API responses with Cache-Control headers
  • Pre-rendered HTML pages
  • Video/audio content

CDN Cache Headers

http
Cache-Control: public, max-age=31536000, immutable # ↑ Cache for 1 year, never revalidate (use for fingerprinted assets) Cache-Control: public, max-age=0, must-revalidate # ↑ Always revalidate with origin (use for HTML pages) ETag: "abc123" # ↑ Fingerprint for conditional requests (304 Not Modified)