Dilip Singh logo
All posts
InfrastructureAdvanced2026-05-28·12 min read

Scaling WebSockets to 100K Concurrent Connections with Redis Streams

A complete guide to horizontally scaling WebSocket servers using Redis Streams as the pub/sub backbone. Connection draining, sticky sessions, message ordering, and observability.

The WebSocket Scaling Problem

A single Node.js process handles around 10K concurrent WebSockets comfortably. Beyond that you need to scale horizontally — but WebSockets are stateful. You can't just put a load balancer in front and call it a day.

The pattern that scales: stateless connection nodes + Redis Streams for pub/sub.

Architecture

code
   Client
     │
[ NLB sticky on cookie ]
     │
┌────┴────┐
│ WS-1    │  ─ subscribes ─→  ┌─────────────┐
│ WS-2    │  ─ subscribes ─→  │   Redis     │
│ WS-3    │  ─ subscribes ─→  │   Streams   │
│ WS-N    │  ─ subscribes ─→  └─────────────┘
└─────────┘

Publishing to a Stream

typescript
import { createClient } from 'redis'
const pub = createClient({ url: REDIS_URL })

export async function broadcastToRoom(roomId: string, message: object) { await pub.xAdd( room:${roomId}, '*', { type: 'message', payload: JSON.stringify(message), timestamp: Date.now().toString(), }, { TRIM: { strategy: 'MAXLEN', strategyModifier: '~', threshold: 1000 } } ) } ```

Each Node Consumes the Stream

typescript
const sub = createClient({ url: REDIS_URL })
await sub.connect()

const consumerGroup = 'ws-server' const consumerId = ws-${process.pid}-${randomUUID()}

async function consumeRoom(roomId: string) { while (true) { const messages = await sub.xReadGroup( consumerGroup, consumerId, [{ key: room:${roomId}, id: '>' }], { BLOCK: 5000, COUNT: 100 } )

if (!messages) continue for (const stream of messages) { for (const msg of stream.messages) { broadcastToLocalSockets(roomId, JSON.parse(msg.message.payload)) await sub.xAck(room:${roomId}, consumerGroup, msg.id) } } } } ```

Connection Draining for Zero-Downtime Deploys

typescript
let isDraining = false
process.on('SIGTERM', () => {
  isDraining = true
  // 1. Stop accepting new connections
  server.close()
  // 2. Tell clients to reconnect to another node
  for (const ws of allClients) {
    ws.send(JSON.stringify({ type: 'drain', retryAfterMs: 0 }))
  }
  // 3. Wait for all to disconnect
  setTimeout(() => process.exit(0), 30000)
})

Backpressure

A single slow client can stall the broadcast loop. Track buffered amount:

typescript
for (const ws of room.clients) {
  if (ws.bufferedAmount > 1_000_000) {
    // Client can't keep up — drop them
    ws.close(1013, 'too slow')
    continue
  }
  ws.send(message)
}

Observability Essentials

Track these per-node metrics in Prometheus:

MetricWhy
`ws_active_connections`Capacity planning
`ws_messages_sent_total`Throughput
`ws_send_buffer_bytes`Slow client detection
`redis_stream_lag_seconds`Pub/sub backpressure
`ws_connection_duration_seconds`Connection churn

At Hureka we hit 80K concurrent WebSockets per node with this architecture before needing larger instances.

DS
Dilip Singh
Lead Software Architect · Hureka Technologies

14+ years building enterprise software and AI systems. Architecting multi-agent AI platforms, RAG pipelines, voice AI, and high-performance SaaS for global clients.