Full example repository: github.com/LuizFernando991/websocket-scaling
Why Node.js for real-time connections
When you maintain thousands of persistent connections, the server model matters as much as the language.
In blocking thread-per-connection architectures, the server assigns one OS thread per active connection. Each thread carries roughly 1–2 MB of overhead just to exist — stack, kernel structures, scheduler bookkeeping. With 100k connections:
100,000 connections × 1 MB per thread = ~100 GB of RAM
This model scales poorly not because of the language, but because of the blocking I/O model. Modern Java went well beyond that: Netty, Vert.x, Spring WebFlux, and Project Loom all use event loops, epoll/kqueue, and non-blocking sockets — the same fundamentals as Node.js. A well-configured Java server with Netty can handle 100k connections with the same efficiency. The problem was never Java; it was blocking I/O.
Node.js runs a single-threaded event loop with non-blocking I/O by default and by design. There is no alternative model to configure — non-blocking behavior is the only behavior. This makes it a natural choice for WebSocket servers where most connections sit idle most of the time, with no risk of accidentally reaching for a blocking API.
An idle connection is not just a JavaScript object. Its full cost includes:
- the
WebSocketobject and your application state (a few KB on the JS heap) - TCP state in the kernel: socket buffers, send/receive queues, timers
- TLS state if the connection is encrypted: cipher context, session keys
- write queues that accumulate if the client is a slow consumer
In practice, 100k idle WebSocket connections in Node.js land somewhere in 2–4 GB of RAM — but this varies significantly depending on TLS, compression, heartbeat frequency, payload size, whether you use ws or Socket.IO, Linux kernel tuning, and GC pressure. Connections with compression enabled or a Redis adapter consume more. A leaner runtime like uWebSockets.js can consume considerably less. Treat the range as an approximate baseline, not a guarantee.
What WebSocket actually is
HTTP is stateless and unidirectional: the client sends a request, the server responds, and the connection closes. Real-time features like chat, live dashboards, and collaborative editing need the opposite — a persistent, bidirectional channel.
WebSocket solves this. It starts as a regular HTTP request with an Upgrade header. If the server accepts, the connection is promoted to a full-duplex TCP channel that stays open until one side closes it:
Client → Server: GET /chat HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Server → Client: HTTP/1.1 101 Switching Protocols
Upgrade: websocket
After the handshake, both sides can send frames at any time without request-response overhead.
ws vs Socket.IO
Two libraries dominate the WebSocket ecosystem in Node.js. They are not equivalent.
ws
- Implements the raw WebSocket protocol with minimal abstraction
- Lower per-message overhead — no custom framing on top of the protocol
- No fallback to long-polling
- Reconnection, rooms, and advanced event semantics are your responsibility
Socket.IO
- Adds a custom event protocol on top of WebSocket (via Engine.IO underneath)
- Automatic client reconnection with exponential backoff out of the box
- Native rooms and namespaces for grouping connections
- Connection middleware (
io.use()) for authentication before the handshake completes - Higher per-message framing overhead
In 2025+, if you don't need fallback for legacy browsers, ws is usually the natural choice for throughput and per-connection cost. Socket.IO's automatic reconnection and rooms are valuable, but they come with a protocol cost that accumulates under load.
Building the raw ws server
Step 1 — Project setup
Create the project and install dependencies:
mkdir ws-server && cd ws-server
npm init -y
npm install ws ioredis dotenv
npm install -D typescript tsx @types/node @types/ws
Create tsconfig.json:
{
"compilerOptions": {
"target": "ES2022",
"module": "NodeNext",
"moduleResolution": "NodeNext",
"strict": true,
"skipLibCheck": true,
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"outDir": "dist",
"rootDir": "src"
},
"include": ["src/**/*.ts"]
}
Add to package.json:
{
"type": "module",
"scripts": {
"dev": "tsx watch src/server.ts",
"build": "tsc",
"start": "node dist/server.js"
}
}
Create the .env file:
PORT=3000
REDIS_URL=redis://localhost:6379
INSTANCE_ID=ws-1
Step 2 — Config from environment
Create src/config.ts. Centralizing environment variable reads makes it easy to run different instances without touching the code:
import "dotenv/config";
export const config = {
port: Number(process.env.PORT ?? 3000),
redisUrl: process.env.REDIS_URL ?? "redis://localhost:6379",
instanceId: process.env.INSTANCE_ID ?? "ws-1",
pubsubChannel: "ws:messages",
heartbeatMs: 30_000, // send ping every 30s
pongTimeoutMs: 10_000, // terminate if no pong arrives within 10s
rateWindowMs: 1_000, // rate limit window: 1 second
maxMessagesPerWindow: 50, // max 50 messages per second per connection
maxBufferedBytes: 1_000_000, // ~1 MB outbound buffer limit
};
Step 3 — Rate limiter
Why do you need a rate limiter?
A WebSocket connection is persistent. Unlike HTTP, where a misbehaving client pays the cost of a new TCP handshake on every request, a WebSocket client connects once and can send messages in a tight loop at maximum speed — thousands per second — with no friction.
Without a limit, a single buggy client (infinite loop on the front end, reconnection storm, misconfigured script) can monopolize the entire server's message processing pipeline. Every CPU cycle spent parsing and routing that client's messages is a cycle stolen from all other connections. In the worst case, the event loop accumulates work and latency rises for everyone on the server.
A per-connection rate limiter ensures fairness: each client gets the same budget, and no individual client affects others regardless of what it does. The limit is applied at the connection level, which is exactly where the problem occurs.
In production you'd likely use a dedicated library like rate-limiter-flexible, which supports Redis-backed limiters — they survive restarts and work across instances. In this article we build a simple fixed-window in-memory limiter — enough to understand the concept and keep the example self-contained. A distributed limiter would only be needed if you wanted to limit a single userId by summing all of its simultaneously open connections — a different problem, not covered here.
Create src/domain/rate-limiter.ts. Each connection will have its own independent instance, so one client does not affect others:
export class FixedWindowRateLimiter {
private count = 0;
private windowStart = Date.now();
constructor(
private readonly windowMs: number,
private readonly maxEvents: number,
) {}
allow(): boolean {
const now = Date.now();
if (now - this.windowStart >= this.windowMs) {
this.count = 0;
this.windowStart = now;
}
this.count++;
return this.count <= this.maxEvents;
}
}
Step 4 — Connection registry
In a more complex setup you could store a userId → instanceId mapping in Redis so that any instance would know exactly where to route without broadcasting to all. Here we keep it simple: each instance maintains an in-memory map of its own sockets and relies on Redis pub/sub for broadcast, letting the correct instance identify itself and deliver.
Create src/infra/connection-registry.ts. The registry maps each userId to its active sockets. It is necessary because Redis pub/sub delivers messages to all instances — each instance checks its own registry to determine if it owns the target socket.
export class ConnectionRegistry<T> {
private readonly users = new Map<string, Set<T>>();
add(userId: string, socket: T): void {
let sockets = this.users.get(userId);
if (!sockets) {
sockets = new Set();
this.users.set(userId, sockets);
}
sockets.add(socket);
}
getAll(userId: string): Set<T> | undefined {
return this.users.get(userId);
}
remove(userId: string, socket: T): void {
const sockets = this.users.get(userId);
if (!sockets) return;
sockets.delete(socket);
if (sockets.size === 0) {
this.users.delete(userId);
}
}
}
Each userId maps to a Set of sockets instead of a single socket. A user can have multiple active connections simultaneously — different devices, browser tabs, or reconnections that haven't fully closed yet. When the last socket for a user disconnects, the Set is removed from the map.
Step 5 — Types
Create src/domain/types.ts:
// message sent from client to server
export type DirectMessageInput = {
type: "message:direct";
toUserId: string;
text: string;
};
// message published to Redis between instances
export type PubSubDirectMessage = {
fromUserId: string;
toUserId: string;
text: string;
ts: number;
};
// message delivered to the target client
export type OutboundDirectMessage = {
type: "message:direct";
fromUserId: string;
text: string;
deliveredBy: string;
ts: number;
};
export type ErrorPayload = {
type: "error:rate_limit" | "error:bad_request";
message: string;
};
Step 6 — The server
Create src/server.ts. Read it section by section:
import "dotenv/config";
import { createServer } from "node:http";
import { Redis } from "ioredis";
import { WebSocketServer, type RawData, type WebSocket } from "ws";
import { config } from "./config.js";
import { FixedWindowRateLimiter } from "./domain/rate-limiter.js";
import type {
DirectMessageInput,
ErrorPayload,
OutboundDirectMessage,
PubSubDirectMessage,
} from "./domain/types.js";
import { ConnectionRegistry } from "./infra/connection-registry.js";
// Extend the WebSocket type with application fields
type AppSocket = WebSocket & {
userId?: string;
awaitingPong: boolean;
bucket: number;
rateLimiter: FixedWindowRateLimiter;
};
Creating the HTTP and WebSocket server:
const server = createServer();
const wss = new WebSocketServer({
server,
perMessageDeflate: {
threshold: 1024, // compress only messages larger than 1 KB
zlibDeflateOptions: { level: 6 }, // compression level (1=fastest, 9=best ratio)
},
});
Redis connections and registry:
Why Redis?
Every Node.js process is an island. Its memory — including the map of connected sockets — is completely private. When you run two instances behind a load balancer, instance 1 has no information about who is connected to instance 2.
Without a shared channel, if Alice is on instance 1 and sends a message to Bob on instance 2, instance 1 simply cannot reach Bob's socket. The message is silently dropped.
You could work around this with sticky sessions — always routing Alice and Bob to the same instance. But sticky sessions create hotspots (one instance ends up with all the heavy users), complicate deployments, and break during scaling events. It's a patch, not a solution.
Redis Pub/Sub is the standard solution. Each instance subscribes to a shared channel. When any instance receives a message, it publishes to Redis. Redis broadcasts to all subscribers. Each instance checks if the target user is connected locally — the one that finds the socket delivers the message. The others ignore it.
Why Redis specifically? It's in-memory (microsecond latency), has native Pub/Sub semantics, and is already present in the vast majority of production stacks. If you run only a single instance and have no plans to scale horizontally, you don't need Redis.
Two separate Redis connections are required. A connection in subscribe mode cannot issue other commands, so publishing and subscribing use separate connections:
const pub = new Redis(config.redisUrl);
const sub = new Redis(config.redisUrl);
const registry = new ConnectionRegistry<AppSocket>();
// timing wheel: 30 buckets, one processed per second
const BUCKETS = Math.max(1, Math.ceil(config.heartbeatMs / 1_000));
const buckets: Set<AppSocket>[] = Array.from({ length: BUCKETS }, () => new Set());
let tick = 0;
Safe send with outbound backpressure protection:
Every outbound message goes through this function. It checks bufferedAmount — the bytes queued for this socket that haven't been sent yet. If the client is too slow to drain the buffer, it terminates the connection before the accumulation consumes too much server RAM:
function sendJson<T extends object>(socket: WebSocket, payload: T): void {
if (socket.bufferedAmount > config.maxBufferedBytes) {
socket.terminate();
return;
}
if (socket.readyState === socket.OPEN) {
socket.send(JSON.stringify(payload));
}
}
function sendError(socket: WebSocket, payload: ErrorPayload): void {
sendJson(socket, payload);
}
Parsing and validating incoming messages:
function parseClientMessage(raw: RawData): DirectMessageInput | null {
try {
const data = JSON.parse(raw.toString()) as Partial<DirectMessageInput>;
if (data.type === "message:direct" && !!data.toUserId && !!data.text) {
return data as DirectMessageInput;
}
return null;
} catch {
return null;
}
}
Cleanly terminating a stale connection:
function terminateStale(socket: AppSocket): void {
buckets[socket.bucket].delete(socket);
if (socket.userId) registry.remove(socket.userId, socket);
socket.terminate();
}
Redis subscriber — receiving cross-instance messages:
Fires on all instances for each published message. Each instance checks its own registry. Only the one that owns the target socket delivers the message:
sub.subscribe(config.pubsubChannel, (err) => {
if (err) console.error("Failed to subscribe:", err.message);
});
sub.on("message", (_channel, raw) => {
let event: PubSubDirectMessage;
try {
event = JSON.parse(raw) as PubSubDirectMessage;
} catch {
return;
}
const targets = registry.getAll(event.toUserId);
if (!targets) return;
const outbound: OutboundDirectMessage = {
type: "message:direct",
fromUserId: event.fromUserId,
text: event.text,
deliveredBy: config.instanceId,
ts: event.ts,
};
for (const target of targets) {
sendJson(target, outbound); // deliver to all active connections of the user
}
});
Connection handler:
wss.on("connection", (socket: AppSocket, req) => {
// 1. validate userId from the URL query string
const url = new URL(req.url ?? "/", "http://localhost");
const userId = url.searchParams.get("userId");
if (!userId) {
sendError(socket, { type: "error:bad_request", message: "Pass ?userId= in the handshake" });
socket.close(1008, "userId is required");
return;
}
// 2. initialize per-connection state
socket.userId = userId;
socket.awaitingPong = false;
socket.bucket = Math.floor(Math.random() * BUCKETS);
socket.rateLimiter = new FixedWindowRateLimiter(
config.rateWindowMs,
config.maxMessagesPerWindow,
);
registry.add(userId, socket);
buckets[socket.bucket].add(socket);
// 3. send connection confirmation
sendJson(socket, { type: "connected", instanceId: config.instanceId, userId });
// 4. pong handler — resets heartbeat state when the client responds
socket.on("pong", () => {
socket.awaitingPong = false;
});
// 5. message handler — rate limit then publish to Redis
socket.on("message", async (raw) => {
if (!socket.rateLimiter.allow()) {
sendError(socket, {
type: "error:rate_limit",
message: `Rate limit exceeded: max ${config.maxMessagesPerWindow} msgs/s per connection`,
});
return;
}
const msg = parseClientMessage(raw);
if (!msg) {
sendError(socket, { type: "error:bad_request", message: 'Send { type: "message:direct", toUserId, text }' });
return;
}
// publish to Redis — whichever instance owns the target socket will deliver
await pub.publish(
config.pubsubChannel,
JSON.stringify({
fromUserId: userId,
toUserId: msg.toUserId,
text: msg.text,
ts: Date.now(),
} satisfies PubSubDirectMessage),
);
});
// 6. cleanup on disconnect
socket.on("close", () => {
buckets[socket.bucket].delete(socket);
registry.remove(userId, socket);
});
socket.on("error", () => {
buckets[socket.bucket].delete(socket);
registry.remove(userId, socket);
});
});
Heartbeat — timing wheel:
A naive approach would ping all connected sockets at once every 30s. With 100k connections, that means 100k simultaneous pings every 30 seconds — a thundering herd that causes CPU and network spikes at predictable intervals.
The timing wheel solves this by splitting connections into 30 buckets. One bucket is processed per second, so pings are distributed evenly across the 30s window. The server does O(n/30) of work per tick instead of O(n) every 30s, and there is still only a single active timer:
const interval = setInterval(() => {
tick = (tick + 1) % BUCKETS;
const bucket = buckets[tick];
for (const socket of bucket) {
if (socket.awaitingPong || socket.readyState !== socket.OPEN) {
terminateStale(socket);
continue;
}
socket.awaitingPong = true;
socket.ping();
}
}, 1_000); // tick every 1s — each bucket is visited once every 30s
wss.on("close", () => clearInterval(interval));
Each connection is assigned to a random bucket (Math.floor(Math.random() * BUCKETS)), distributing pings uniformly regardless of arrival time. Using tick instead would cluster connections that arrive simultaneously into the same bucket — random assignment avoids this.
BUCKETS is derived from config as Math.max(1, Math.ceil(config.heartbeatMs / 1_000)). Math.ceil instead of Math.round avoids rounding down to zero for sub-second values, and Math.max(1, ...) guarantees at least one bucket even if heartbeatMs is misconfigured.
Starting the server:
server.listen(config.port, () => {
console.log(`[${config.instanceId}] native ws listening on port ${config.port}`);
});
Run the server:
npm run dev
Test with any WebSocket client. Connect to ws://localhost:3000?userId=alice and send:
{ "type": "message:direct", "toUserId": "bob", "text": "hello" }
Building the Socket.IO server
Step 1 — Project setup
mkdir socket-io-server && cd socket-io-server
npm init -y
npm install socket.io @socket.io/redis-adapter ioredis dotenv
npm install -D typescript tsx @types/node
Use the same package.json scripts. The .env has the same structure — just change INSTANCE_ID.
Create tsconfig.json — identical to the ws one, but with "types": ["node"] so that setInterval resolves to NodeJS.Timeout and .unref() works without a cast:
{
"compilerOptions": {
"target": "ES2022",
"module": "NodeNext",
"moduleResolution": "NodeNext",
"strict": true,
"types": ["node"],
"skipLibCheck": true,
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"outDir": "dist",
"rootDir": "src"
},
"include": ["src/**/*.ts"]
}
Step 2 — Config
Create src/config.ts:
import "dotenv/config";
export const config = {
port: Number(process.env.PORT ?? 3000),
redisUrl: process.env.REDIS_URL ?? "redis://localhost:6379",
instanceId: process.env.INSTANCE_ID ?? "sio-1",
heartbeatMs: 30_000,
pongTimeoutMs: 10_000,
rateWindowMs: 1_000,
maxMessagesPerWindow: 50,
maxBufferedBytes: 1_000_000,
};
Step 3 — Redis adapter helper
Create src/infra/redis-adapter.ts. Isolates adapter setup so the main server file stays clean:
import { createAdapter } from "@socket.io/redis-adapter";
import { Redis } from "ioredis";
import type { Server } from "socket.io";
export function attachRedisAdapter(io: Server, redisUrl: string): void {
const pubClient = new Redis(redisUrl);
const subClient = pubClient.duplicate(); // separate connection for subscribe
pubClient.on("error", (err) => console.error("[redis pub]", err));
subClient.on("error", (err) => console.error("[redis sub]", err));
io.adapter(createAdapter(pubClient, subClient));
}
Step 4 — Types and rate limiter
Create the same FixedWindowRateLimiter from the ws example. Socket.IO types are slightly simpler because the library handles message framing:
// src/domain/types.ts
// received from client (no "type" field — Socket.IO uses event names)
export type DirectMessageInput = {
toUserId: string;
text: string;
};
// delivered to recipient
export type ServerDirectMessage = {
fromUserId: string;
text: string;
deliveredBy: string;
ts: number;
};
export type ServerErrorPayload = {
message: string;
};
Step 5 — The server
Create src/server.ts. The complete file, in the order it appears:
Types and imports:
Socket.IO lets you type all events end-to-end. The generics on Server ensure you only emit events the client expects, with the correct payload shapes:
import "dotenv/config";
import { createServer } from "node:http";
import { Server, type Socket } from "socket.io";
import { config } from "./config.js";
import { FixedWindowRateLimiter } from "./domain/rate-limiter.js";
import type { DirectMessageInput, ServerDirectMessage, ServerErrorPayload } from "./domain/types.js";
import { attachRedisAdapter } from "./infra/redis-adapter.js";
type ClientToServerEvents = {
"message:direct": (payload: DirectMessageInput) => void;
};
type ServerToClientEvents = {
connected: (payload: { instanceId: string; socketId: string; userId: string }) => void;
"message:direct": (payload: ServerDirectMessage) => void;
"error:rate_limit": (payload: ServerErrorPayload) => void;
"error:bad_request": (payload: ServerErrorPayload) => void;
};
type SocketData = {
userId: string;
rateLimiter: FixedWindowRateLimiter;
bpBucket: number;
};
type AppSocket = Socket<ClientToServerEvents, ServerToClientEvents, Record<string, never>, SocketData>;
Backpressure helpers:
getBufferedAmount reads the underlying TCP socket's bufferedAmount via optional chaining — returns 0 if any level of the chain is null. disconnectIfSlow wraps the decision: disconnects and returns true if the buffer has exceeded the limit, does nothing and returns false otherwise:
function getBufferedAmount(socket: unknown): number {
return (socket as { conn?: { transport?: { socket?: { bufferedAmount?: number } } } })
.conn?.transport?.socket?.bufferedAmount ?? 0;
}
function disconnectIfSlow(socket: AppSocket): boolean {
if (getBufferedAmount(socket) <= config.maxBufferedBytes) return false;
socket.disconnect(true);
return true;
}
Server instantiation:
Socket.IO by default starts connections with long-polling and upgrades to WebSocket later. This exists for compatibility with old proxies, corporate firewalls, and legacy browsers. In modern applications this fallback creates more problems than it solves:
bufferedAmountdoes not exist on polling transports — the backpressure monitor silently becomes a no-op for clients still on polling- Polling has higher overhead: each poll is a full HTTP request-response cycle
- The transport can be inspected via
socket.conn.transport.name("polling"or"websocket")
transports: ["websocket"] forces WebSocket directly, making behavior predictable and backpressure reliable:
const httpServer = createServer();
const io = new Server<ClientToServerEvents, ServerToClientEvents, Record<string, never>, SocketData>(
httpServer,
{
cors: { origin: "*" },
transports: ["websocket"],
pingInterval: config.heartbeatMs,
pingTimeout: config.pongTimeoutMs,
perMessageDeflate: {
threshold: 1024,
zlibDeflateOptions: { level: 6 },
},
},
);
Redis adapter, buckets, middleware, and connection handler:
// one line — from here, io.to(room).emit() works cross-instance
attachRedisAdapter(io, config.redisUrl);
// timing wheel: 30 buckets, one checked per second → O(n/30) per tick
const BP_BUCKETS = 30;
const bpBuckets: Set<AppSocket>[] = Array.from({ length: BP_BUCKETS }, () => new Set());
let bpTick = 0;
// runs before "connection" — next(new Error()) rejects the handshake
io.use((socket, next) => {
const rawUserId = socket.handshake.auth.userId ?? socket.handshake.query.userId;
if (typeof rawUserId !== "string" || rawUserId.trim().length === 0) {
return next(new Error("userId is required in auth.userId or query ?userId="));
}
socket.data.userId = rawUserId;
socket.data.rateLimiter = new FixedWindowRateLimiter(
config.rateWindowMs,
config.maxMessagesPerWindow,
);
next();
});
io.on("connection", (socket) => {
const userId = socket.data.userId;
// join the user's room — io.to("user:alice").emit() reaches Alice
// on any instance, because the adapter replicates to all via Redis
socket.join(`user:${userId}`);
// assign to a random bucket to spread checks across time
const bucket = Math.floor(Math.random() * BP_BUCKETS);
socket.data.bpBucket = bucket;
bpBuckets[bucket].add(socket);
socket.emit("connected", {
instanceId: config.instanceId,
socketId: socket.id,
userId,
});
socket.on("message:direct", (payload) => {
if (!socket.data.rateLimiter.allow()) {
socket.emit("error:rate_limit", {
message: `Rate limit exceeded: max ${config.maxMessagesPerWindow} msgs/s per connection`,
});
return;
}
if (!payload?.toUserId || !payload?.text) {
socket.emit("error:bad_request", { message: "Send { toUserId, text }" });
return;
}
// deliver to all connections of the target user, on any instance
io.to(`user:${payload.toUserId}`).emit("message:direct", {
fromUserId: userId,
text: payload.text,
deliveredBy: config.instanceId,
ts: Date.now(),
});
});
socket.on("disconnect", () => {
bpBuckets[socket.data.bpBucket].delete(socket);
});
});
Backpressure monitor and initialization:
Socket.IO does not expose a single send interception point, so backpressure is checked on a separate interval. We apply the same timing wheel as the ws server: 30 buckets, 1 bucket checked per second, O(n/30) per tick instead of O(n) every 5s. Maximum detection latency rises to 30s — acceptable for backpressure, because a slow consumer that has accumulated 1 MB of buffer and hasn't drained in 30s will not recover. .unref() ensures the timer does not keep the process alive once all connections are closed:
const outboundMonitor = setInterval(() => {
bpTick = (bpTick + 1) % BP_BUCKETS;
for (const socket of bpBuckets[bpTick]) {
disconnectIfSlow(socket);
}
}, 1_000);
outboundMonitor.unref();
io.on("close", () => clearInterval(outboundMonitor));
httpServer.listen(config.port, () => {
console.log(`[${config.instanceId}] socket.io listening on port ${config.port}`);
});
Alternative: check-on-send with a local registry
It is possible to replicate the
wsserver's behavior inside Socket.IO: maintain a localMap<userId, Set<AppSocket>>, iterate the target user's sockets, and calldisconnectIfSlowbefore eachsocket.emit(). The problem is thatio.to(room).emit()with the Redis adapter also delivers to local sockets — you would get double delivery. To avoid it, you would need to drop the adapter for message delivery and do manual pub/sub withioredis, exactly as thewsserver does. At that point, Socket.IO becomes only the connection layer (handshake, middleware, client reconnection), and you have taken over all routing logic. If check-on-send is a hard requirement — financial messages, latency-critical gaming — this architecture makes sense, and rawwsis probably the more honest choice. For general messaging, the timing wheel is sufficient: a consumer that has accumulated 1 MB of buffer without draining in 30s will not recover — disconnecting at that point is equivalent to disconnecting immediately.
Running multiple instances with Docker Compose
Both servers are stateless — state lives in Redis. You can run as many instances as you want behind a load balancer. The docker-compose.yml below is a local development example only — it starts Redis plus two instances of each server to demonstrate cross-instance delivery. Do not use this file in production directly; a real environment would require proper orchestration (Kubernetes, ECS, etc.), secure environment variables, and a load balancing strategy.
services:
redis:
image: redis:7-alpine
container_name: ws-redis
ports:
- "6379:6379"
socketio-1:
image: node:20-alpine
container_name: socketio-1
working_dir: /app
command: sh -c "npm install && npm run dev"
volumes:
- <YOUR_PROJECT_PATH>:/app
- /app/node_modules
environment:
- PORT=3001
- REDIS_URL=redis://redis:6379
- INSTANCE_ID=socketio-1
depends_on:
- redis
ports:
- "3001:3001"
socketio-2:
image: node:20-alpine
container_name: socketio-2
working_dir: /app
command: sh -c "npm install && npm run dev"
volumes:
- <YOUR_PROJECT_PATH>:/app
- /app/node_modules
environment:
- PORT=3002
- REDIS_URL=redis://redis:6379
- INSTANCE_ID=socketio-2
depends_on:
- redis
ports:
- "3002:3002"
ws-1:
image: node:20-alpine
container_name: ws-1
working_dir: /app
command: sh -c "npm install && npm run dev"
volumes:
- <YOUR_PROJECT_PATH>:/app
- /app/node_modules
environment:
- PORT=4001
- REDIS_URL=redis://redis:6379
- INSTANCE_ID=ws-1
depends_on:
- redis
ports:
- "4001:4001"
ws-2:
image: node:20-alpine
container_name: ws-2
working_dir: /app
command: sh -c "npm install && npm run dev"
volumes:
- <YOUR_PROJECT_PATH>:/app
- /app/node_modules
environment:
- PORT=4002
- REDIS_URL=redis://redis:6379
- INSTANCE_ID=ws-2
depends_on:
- redis
ports:
- "4002:4002"
Bring everything up:
docker compose up
To verify cross-instance delivery: connect Alice to ws-1 (localhost:4001?userId=alice) and Bob to ws-2 (localhost:4002?userId=bob). Send a message from Alice to Bob. The deliveredBy field in the message Bob receives will show ws-2, confirming the message traveled through Redis from one instance to the other.
Backpressure and rate limiting — two sides of per-connection protection
Two distinct problems can overload your server at the connection layer. They require separate solutions.
Ingress — the client sends too fast
A buggy or malicious client can flood the server with thousands of messages per second. The FixedWindowRateLimiter built above limits each connection independently — one bad client does not starve the others.
Egress — the client receives too slowly
Imagine a client on a degraded network. The server keeps sending; the client cannot drain fast enough. The kernel's TCP send buffer fills up. Node.js queues data in memory. Left unchecked, a single slow client can consume gigabytes of server RAM while all other connections remain healthy.
bufferedAmount is the metric to watch: bytes queued for this socket that haven't been transmitted yet.
Native ws approach — checked synchronously before each send, O(1), zero detection latency:
function sendJson(socket, payload) {
if (socket.bufferedAmount > MAX_BUFFER_BYTES) {
socket.terminate(); // reclaim the queued memory immediately
return;
}
socket.send(JSON.stringify(payload));
}
Socket.IO approach — same timing wheel as ws: 30 buckets, 1 bucket per second, O(n/30) per tick. Maximum detection latency of 30s — acceptable for backpressure. Covered in Step 5 above.
Summary: rate limiting protects against clients that send too much; bufferedAmount monitoring protects against clients that receive too little. Both are necessary — they guard opposite sides of the connection.
Conclusion
Socket.IO gives you productivity and ready-made conventions: automatic reconnection, rooms, namespaces, and connection middleware out of the box with no extra code. The tradeoff is higher protocol overhead and less control over low-level behavior.
ws gives you lower overhead, a higher performance ceiling, and a simpler mental model. The tradeoff is that every feature you get for free in Socket.IO becomes your responsibility.
The best decision is the one that matches the operational cost your team wants to carry.
Example reference in the repository
Repository: github.com/LuizFernando991/websocket-scaling
Socket.IO:examples/socket-io/src/server.ts- Raw
ws:examples/ws-raw/src/server.ts - Docker Compose:
docker-compose.yml
