Imagine you are in a library. You want to ask the librarian a series of questions. In the old web model, you would walk up to the desk, ask a question, wait for the answer, walk back to your seat, process the answer, then walk back to ask the next question. Every question requires a new trip. That is HTTP polling.
Now imagine a different library. You walk up once, hand the librarian a note that says “I will be asking follow-up questions.” The librarian nods. You ask your first question, get an answer immediately, ask a follow-up, get another answer. The conversation flows naturally, both directions, without you leaving the desk. That is WebSocket.
The web was built on a request-response cycle. The client sends a request, the server sends one response, and the connection closes. This is fine for documents. But what about chat messages, live game state, streaming stock prices, or collaborative editing?
Engineers tried workarounds:
HTTP polling: The client sends a request every N seconds asking “any updates?” The server responds with the data or an empty body. Simple to implement, but wasteful. Most requests return nothing, and there is always latency equal to the polling interval.
Long-polling: The client sends a request. The server holds it open until new data is available, then responds. The client immediately sends a new request. This reduces empty responses but still creates a new HTTP connection for every message. Headers are sent each time, adding overhead.
HTTP streaming: The server sends partial chunks of a response without closing the connection (Transfer-Encoding: chunked). The client reads chunks as they arrive. This avoids reconnection overhead but is still unidirectional (server to client only) and the client cannot send data through the same stream.
Each approach has tradeoffs. None of them give true bidirectional, low-latency communication over a single connection.
WebSocket (RFC 6455) solves this by upgrading an HTTP connection into a persistent, full-duplex communication channel over a single TCP socket.
The key properties:
The connection starts as HTTP, then upgrades. The server responds with a 101 Switching Protocols status, and from that point forward, both sides speak the WebSocket protocol over the same TCP socket.
| Feature | HTTP Polling | Long-Polling | SSE | WebSocket |
|---|---|---|---|---|
| Direction | Client to Server | Client to Server | Server to Client only | Bidirectional |
| Overhead | High (headers each time) | High (headers each time) | Low (one connection) | Very low (2-byte min frame) |
| Latency | Poll interval | One HTTP round trip | Immediate | Immediate |
| Binary | Yes (HTTP body) | Yes (HTTP body) | Base64 needed | Native binary |
| Auto-reconnect | Implicit (next poll) | Implicit (next poll) | Built-in | Manual implementation |
| Proxy friendly | Yes | Yes | Yes | May be blocked |
| Complexity | Trivial | Moderate | Simple | Moderate |
WebSocket wins on latency and overhead. SSE wins on simplicity and auto-reconnection. Polling wins on compatibility. Choose based on your use case.
Every WebSocket connection begins as an HTTP request. The client sends a standard GET request with special headers:
Upgrade: websocket — signals the intent to switch protocolsConnection: Upgrade — tells intermediaries not to treat this as a regular HTTP requestSec-WebSocket-Key — a 16-byte random value, base64-encoded. Used to prove the server understands the WebSocket protocolSec-WebSocket-Version: 13 — the protocol version (currently the only standardized version)The server must not simply accept any upgrade request. It needs to prove it understands the protocol. The server computes a response token by:
Sec-WebSocket-Key with the magic GUID 258EAFA5-E914-47DA-95CA-C5AB0DC85B11The result is sent as Sec-WebSocket-Accept. This proves the server read and understood the WebSocket specification, because only someone who knows the magic GUID can produce the correct accept value.
The client sends an HTTP Upgrade request. The server computes the accept value by concatenating the key with a magic GUID, taking SHA-1, and base64-encoding the result.
The handshake is deliberately simple. It reuses HTTP infrastructure (port 80/443, proxies, authentication) while establishing a protocol switch. After the handshake, the HTTP connection ceases to exist — both sides now speak WebSocket frames over the raw TCP socket.
After the upgrade, all data is sent in frames. The frame format is binary and compact:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued (if payload len==127) |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key (if MASK set) |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data (continued) |
+---------------------------------------------------------------+
The frame format is minimal by design. A simple text frame with a short payload can be as small as 2 bytes of overhead (FIN + opcode + length). Compare that to the hundreds of bytes of HTTP headers in a polling request, and the efficiency gain is clear.
WebSocket defines five opcodes, divided into data frames and control frames:
Data frames:
Control frames:
Control frames must not be fragmented. They can appear between fragments of a data message. Control frames have a maximum payload length of 125 bytes.
The opcode in the first fragment of a message tells you the type of the entire message (text or binary). Continuation frames always have opcode 0. When FIN=1 on a continuation frame, the message is complete.
One quirk of the WebSocket protocol: client-to-server frames must have MASK=1, while server-to-client frames must have MASK=0.
Why? The WebSocket working group identified a security issue called “cache poisoning” or “cross-protocol attack.” An attacker could craft a WebSocket client that sends data that looks like a valid HTTP request to an intermediary (proxy, cache). If the intermediary misinterpreted the WebSocket data as HTTP, it could poison its cache.
Masking prevents this by XORing the payload with a random 4-byte key. The intermediary sees random bytes that do not match any known protocol. Once the connection is established, the intermediary treats it as opaque TCP data.
The masking key is chosen randomly per frame. Each byte of the payload is XORed with maskingKey[i % 4]. The receiver XORs with the same key to recover the original payload.
This is not encryption. Masking is a defense against broken intermediaries, not a confidentiality mechanism. For actual security, use WSS (WebSocket over TLS).
Let us build a minimal WebSocket server in Node.js to see how the protocol works in practice. We will use only the built-in http and crypto modules — no third-party libraries.
import { createServer } from 'http'
import { createHash } from 'crypto'
const MAGIC_GUID = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'
const server = createServer((req, res) => {
const key = req.headers['sec-websocket-key']
const upgrade = req.headers['upgrade']
if (req.url === '/ws' && upgrade?.toLowerCase() === 'websocket' && key) {
const accept = createHash('sha1')
.update(key + MAGIC_GUID)
.digest('base64')
res.writeHead(101, {
'Upgrade': 'websocket',
'Connection': 'Upgrade',
'Sec-WebSocket-Accept': accept,
})
res.socket.setNoDelay(true)
const socket = res.socket
// Now we can read/write WebSocket frames using the raw socket
// (see next section for frame parsing)
} else {
res.writeHead(404)
res.end()
}
})
server.listen(8080)
The createHash('sha1').update(key + MAGIC_GUID).digest('base64') computes the accept token we explored earlier.
Once the 101 response is sent, res.socket gives us raw access to the TCP socket. We no longer use the HTTP response object — we read and write WebSocket frames directly.
A simple frame parser in Node.js:
function parseFrame(buffer) {
const firstByte = buffer[0]
const secondByte = buffer[1]
const fin = (firstByte & 0x80) !== 0
const opcode = firstByte & 0x0f
const masked = (secondByte & 0x80) !== 0
let payloadLen = secondByte & 0x7f
let offset = 2
if (payloadLen === 126) {
payloadLen = buffer.readUInt16BE(2)
offset = 4
} else if (payloadLen === 127) {
payloadLen = Number(buffer.readBigUInt64BE(2))
offset = 10
}
let maskKey = null
if (masked) {
maskKey = buffer.slice(offset, offset + 4)
offset += 4
}
let payload = buffer.slice(offset, offset + payloadLen)
if (masked) {
payload = Buffer.from(
payload.map((byte, i) => byte ^ maskKey[i % 4])
)
}
return { fin, opcode, masked, payloadLen, payload: payload.toString() }
}
And a frame builder:
function buildFrame(payload, opcode = 0x1) {
const payloadBuf = Buffer.from(payload, 'utf-8')
const len = payloadBuf.length
const header = []
header.push(0x80 | opcode)
if (len < 126) {
header.push(len)
} else if (len < 65536) {
header.push(126, (len >> 8) & 0xff, len & 0xff)
} else {
header.push(127)
const bigLen = BigInt(len)
for (let i = 7; i >= 0; i--) {
header.push(Number((bigLen >> BigInt(i * 8)) & 0xffn))
}
}
return Buffer.concat([Buffer.from(header), payloadBuf])
}
The same concepts apply in any language. Here is a Python server using the asyncio and hashlib standard libraries:
import asyncio
import hashlib
import base64
MAGIC_GUID = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'
def compute_accept(key):
sha1 = hashlib.sha1()
sha1.update((key + MAGIC_GUID).encode())
return base64.b64encode(sha1.digest()).decode()
async def handle_client(reader, writer):
data = await reader.read(4096)
request = data.decode()
key = None
for line in request.split('\r\n'):
if line.lower().startswith('sec-websocket-key'):
key = line.split(':')[1].strip()
break
accept = compute_accept(key)
response = (
'HTTP/1.1 101 Switching Protocols\r\n'
'Upgrade: websocket\r\n'
'Connection: Upgrade\r\n'
f'Sec-WebSocket-Accept: {accept}\r\n'
'\r\n'
)
writer.write(response.encode())
await writer.drain()
# Parse and echo frames
while True:
frame = await reader.read(4096)
if not frame or len(frame) < 2:
break
first_byte = frame[0]
second_byte = frame[1]
opcode = first_byte & 0x0f
if opcode == 0x8: # Close
break
elif opcode == 0x9: # Ping
writer.write(bytes([0x8a, 0x00]))
await writer.drain()
continue
# Parse payload length
payload_len = second_byte & 0x7f
offset = 2
if payload_len == 126:
payload_len = int.from_bytes(frame[2:4], 'big')
offset = 4
elif payload_len == 127:
payload_len = int.from_bytes(frame[2:10], 'big')
offset = 10
# Unmask
mask_key = frame[offset:offset+4]
offset += 4
payload = bytearray(frame[offset:offset+payload_len])
for i in range(len(payload)):
payload[i] ^= mask_key[i % 4]
print(f"Received: {payload.decode()}")
# Echo back (unmasked)
echo = build_frame(payload.decode(), 0x1)
writer.write(echo)
await writer.drain()
writer.close()
async def main():
server = await asyncio.start_server(handle_client, '0.0.0.0', 8080)
async with server:
await server.serve_forever()
asyncio.run(main())
You can test the handshake with curl:
# WebSocket handshake using curl
curl -i -N \
-H "Connection: Upgrade" \
-H "Upgrade: websocket" \
-H "Sec-WebSocket-Version: 13" \
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
http://localhost:8080/ws
The server responds with 101 Switching Protocols and the Sec-WebSocket-Accept header. After this, curl drops into raw TCP mode, and you would need to manually type WebSocket frames (not practical — use a tool like websocat instead):
# Interactive WebSocket test with websocat
websocat ws://localhost:8080/ws
WebSocket allows messages to be split across multiple frames. This is called fragmentation. It is useful when:
The rules are straightforward:
Large messages are split into frames. The first frame has the opcode (e.g., 1 for text), continuation frames have opcode 0, and the final frame has FIN=1.
The receiver reassembles the message by concatenating payloads from all fragments in order. The opcode from the first fragment determines the message type (text or binary). If a non-zero opcode appears on a non-initial fragment, it is a protocol error.
Fragmentation is transparent to the application layer. Most WebSocket libraries reassemble frames before delivering the message to your code. But understanding fragmentation matters for:
WebSocket connections over TCP can remain open indefinitely. But network equipment (NATs, firewalls, proxies, load balancers) has idle timeouts. If no data passes through for a configurable period, the intermediary may close the connection.
Ping/Pong frames keep the connection alive. The client sends a Ping frame (opcode 9), and the server must respond with a Pong frame (opcode 10) as soon as possible.
WebSocket control frames keep idle connections alive. The client sends a Ping (opcode 9), the server responds with a Pong (opcode 10). If no Pong arrives, the connection is considered dead.
The JavaScript WebSocket API does not expose ping/pong directly (the browser handles them automatically). When building a custom WS server, you must implement this:
// Server-side heartbeat
const INTERVAL = 30000 // 30 seconds
const heartbeat = setInterval(() => {
if (socket.readyState === WebSocket.OPEN) {
socket.ping()
socket._pingTimeout = setTimeout(() => {
socket.terminate() // No pong received
}, 10000) // 10 second timeout
}
}, INTERVAL)
socket.on('pong', () => {
clearTimeout(socket._pingTimeout)
})
socket.on('close', () => {
clearInterval(heartbeat)
clearTimeout(socket._pingTimeout)
})
A WebSocket library like ws in Node.js handles ping/pong and connection health tracking for you:
import { WebSocketServer } from 'ws'
const wss = new WebSocketServer({ port: 8080 })
wss.on('connection', (ws) => {
ws.isAlive = true
ws.on('pong', () => { ws.isAlive = true })
})
// Heartbeat check every 30 seconds
const interval = setInterval(() => {
wss.clients.forEach((ws) => {
if (ws.isAlive === false) return ws.terminate()
ws.isAlive = false
ws.ping()
})
}, 30000)
wss.on('close', () => clearInterval(interval))
The heartbeat interval should be shorter than the network path’s idle timeout. A common choice is 30-45 seconds, which works behind most NATs and cloud load balancers.
Closing a WebSocket connection is a handshake, not an abrupt teardown. Either side can initiate a close by sending a Close frame (opcode 8). The receiving side must respond with its own Close frame.
A close frame (opcode 8) contains a 2-byte status code and an optional reason string. The server echoes the close frame to confirm. If no close is received, the connection is abnormally closed.
The Close frame has an optional body:
Common status codes:
| Code | Name | Meaning |
|---|---|---|
| 1000 | Normal Closure | The purpose of the connection was fulfilled |
| 1001 | Going Away | Server is shutting down, or client navigated away |
| 1002 | Protocol Error | Received an invalid frame |
| 1003 | Unsupported Data | Received a data type that cannot be accepted |
| 1007 | Invalid Payload Data | Received data that does not match the type (e.g., invalid UTF-8) |
| 1008 | Policy Violation | Received a message that violates server policy |
| 1009 | Message Too Big | The message exceeds the maximum allowed size |
| 1011 | Internal Error | Server encountered an unexpected condition |
If a Close frame is not received (e.g., the TCP connection drops), the closure is considered abnormal. The side that detects the TCP close should assume the connection is dead and clean up local resources.
import WebSocket from 'ws'
function gracefulClose(ws, code = 1000, reason = '') {
ws.close(code, reason)
// ws 'close' event fires when the server echoes the close frame
ws.on('close', () => {
console.log(`Closed: ${code} ${reason}`)
})
}
// Timeout for abnormal close
const closeTimeout = setTimeout(() => {
if (ws.readyState !== WebSocket.CLOSED) {
console.warn('Abnormal close - terminating')
ws.terminate() // Force TCP close
}
}, 5000)
WebSocket servers are stateful. Each connection maintains server-side state (session, authentication, subscription channels). This creates scaling challenges that stateless HTTP does not have.
When a client connects through a load balancer, the initial HTTP upgrade request goes to one server. All subsequent WebSocket frames must go to the same server, because that server holds the connection state.
Load balancers solve this with sticky sessions (also called session affinity):
# Nginx WebSocket proxy with sticky sessions
upstream ws_backend {
ip_hash;
server 10.0.1.1:8080;
server 10.0.2.1:8080;
server 10.0.3.1:8080;
}
server {
listen 80;
location /ws {
proxy_pass http://ws_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 86400s;
}
}
High-traffic WebSocket applications use a pub/sub backend to broadcast messages across servers:
import { createClient } from 'redis'
import { WebSocketServer } from 'ws'
const redis = createClient()
await redis.connect()
const wss = new WebSocketServer({ port: 8080 })
const subscriptions = new Map()
wss.on('connection', (ws) => {
ws.on('message', async (data) => {
const msg = JSON.parse(data.toString())
if (msg.type === 'subscribe') {
const subscriber = redis.duplicate()
await subscriber.connect()
await subscriber.subscribe(msg.channel, (message) => {
ws.send(message)
})
subscriptions.set(ws, subscriber)
}
if (msg.type === 'publish') {
await redis.publish(msg.channel, JSON.stringify(msg.data))
}
})
ws.on('close', () => {
const subscriber = subscriptions.get(ws)
if (subscriber) {
subscriber.quit()
subscriptions.delete(ws)
}
})
})
Unlike HTTP/2, which multiplexes multiple streams over a single connection, WebSocket defines a single message stream per connection. To multiplex, you need either:
RFC 8441 defines how to run WebSocket over HTTP/2. Instead of a single TCP connection per WebSocket, the WebSocket is tunneled over an HTTP/2 stream. Multiple WebSocket connections can share one HTTP/2 connection.
This eliminates the TCP connection overhead per WebSocket and enables true multiplexing. The WebSocket frames are sent in DATA frames of the HTTP/2 stream, preserving the original frame boundaries.
Browser support for wss:// over HTTP/2 exists in modern browsers (Chrome, Firefox, Safari). The browser automatically negotiates the transport at the connection level.
These three technologies overlap in the real-time communication space but have different strengths:
| Feature | WebSocket | SSE | gRPC Stream |
|---|---|---|---|
| Direction | Bidirectional | Server to Client | Bidirectional |
| Transport | TCP (or HTTP/2) | HTTP/1.1+ | HTTP/2 |
| Message format | Binary or Text | Text only | Protobuf (binary) |
| Streaming | Full-duplex | Server -> Client | Full-duplex |
| Auto-reconnect | Manual | Built-in | Manual |
| Language support | All languages | Browser + Server | gRPC ecosystem |
| Proxy complexity | May be blocked | Works through proxies | Requires HTTP/2 |
| Typical use case | Chat, gaming, live sync | Notifications, feeds | Microservices, streaming RPC |
Before you close this page, make sure you can answer these: