Scaling WebSockets to 100K Concurrent Connections with Redis Streams
A complete guide to horizontally scaling WebSocket servers using Redis Streams as the pub/sub backbone. Connection draining, sticky sessions, message ordering, and observability.
The WebSocket Scaling Problem
A single Node.js process handles around 10K concurrent WebSockets comfortably. Beyond that you need to scale horizontally — but WebSockets are stateful. You can't just put a load balancer in front and call it a day.
The pattern that scales: stateless connection nodes + Redis Streams for pub/sub.
Architecture
Client
│
[ NLB sticky on cookie ]
│
┌────┴────┐
│ WS-1 │ ─ subscribes ─→ ┌─────────────┐
│ WS-2 │ ─ subscribes ─→ │ Redis │
│ WS-3 │ ─ subscribes ─→ │ Streams │
│ WS-N │ ─ subscribes ─→ └─────────────┘
└─────────┘
Publishing to a Stream
import { createClient } from 'redis'
const pub = createClient({ url: REDIS_URL })export async function broadcastToRoom(roomId: string, message: object) {
await pub.xAdd(
room:${roomId},
'*',
{
type: 'message',
payload: JSON.stringify(message),
timestamp: Date.now().toString(),
},
{ TRIM: { strategy: 'MAXLEN', strategyModifier: '~', threshold: 1000 } }
)
}
```
Each Node Consumes the Stream
const sub = createClient({ url: REDIS_URL })
await sub.connect()const consumerGroup = 'ws-server'
const consumerId = ws-${process.pid}-${randomUUID()}
async function consumeRoom(roomId: string) {
while (true) {
const messages = await sub.xReadGroup(
consumerGroup,
consumerId,
[{ key: room:${roomId}, id: '>' }],
{ BLOCK: 5000, COUNT: 100 }
)
if (!messages) continue
for (const stream of messages) {
for (const msg of stream.messages) {
broadcastToLocalSockets(roomId, JSON.parse(msg.message.payload))
await sub.xAck(room:${roomId}, consumerGroup, msg.id)
}
}
}
}
```
Connection Draining for Zero-Downtime Deploys
let isDraining = false
process.on('SIGTERM', () => {
isDraining = true
// 1. Stop accepting new connections
server.close()
// 2. Tell clients to reconnect to another node
for (const ws of allClients) {
ws.send(JSON.stringify({ type: 'drain', retryAfterMs: 0 }))
}
// 3. Wait for all to disconnect
setTimeout(() => process.exit(0), 30000)
})
Backpressure
A single slow client can stall the broadcast loop. Track buffered amount:
for (const ws of room.clients) {
if (ws.bufferedAmount > 1_000_000) {
// Client can't keep up — drop them
ws.close(1013, 'too slow')
continue
}
ws.send(message)
}
Observability Essentials
Track these per-node metrics in Prometheus:
| Metric | Why |
|---|---|
| `ws_active_connections` | Capacity planning |
| `ws_messages_sent_total` | Throughput |
| `ws_send_buffer_bytes` | Slow client detection |
| `redis_stream_lag_seconds` | Pub/sub backpressure |
| `ws_connection_duration_seconds` | Connection churn |
At Hureka we hit 80K concurrent WebSockets per node with this architecture before needing larger instances.