Webhooks are automated HTTP callbacks triggered by events. When something happens in a source system — a payment succeeds, a repository is pushed to, a new user signs up — the source sends an HTTP POST request to a pre-registered URL owned by the consumer.
Unlike polling, where the client repeatedly asks “is there new data?”, webhooks push the data the moment it exists. The client does nothing but listen.
Real-world examples:
payment_intent.succeeded to your endpoint after a charge goes throughpush events to CI systems like Jenkins or GitHub Actionsevent_callback for messages, reactions, and channel activitystatus_callback when an SMS is delivered or failsorders/create when a customer places an order| Aspect | Polling | Webhooks |
|---|---|---|
| Direction | Client pulls from server | Server pushes to client |
| Latency | Depends on poll interval (seconds to minutes) | Near real-time (milliseconds) |
| Server load | Wasted requests when no data | Zero wasted requests |
| Complexity | Simple to implement | Requires public endpoint, retry logic |
| Reliability | Client controls retry | Server must handle retry + delivery guarantees |
| Firewall | Client initiates, no inbound needed | Client must expose a public endpoint |
Webhooks shine when you need real-time event delivery and control over the consumer is limited (third-party integrations). They struggle when clients are behind strict firewalls, cannot maintain a public endpoint, or need ordered exactly-once delivery (which webhooks do not natively guarantee).
Toggle requirements on/off and cycle priorities to plan your webhook delivery system.
Every webhook delivery follows the same lifecycle. Understanding this lifecycle is the foundation of any webhook system design.
pending2xx: mark as delivered, done4xx (client error): the endpoint is misconfigured or rejecting — do not retry5xx (server error): retry with exponential backoffEach retry waits longer than the last. A typical schedule:
{
"retry_1": "1 minute",
"retry_2": "5 minutes",
"retry_3": "15 minutes",
"retry_4": "1 hour",
"retry_5": "6 hours",
"retry_6": "24 hours"
}
Total time before DLQ: approximately 32.7 hours. Each delay is computed as base * 2^(attempt - 1) plus a small random jitter to prevent the thundering herd problem when many clients fail simultaneously.
import random
import time
def compute_delay(attempt: int, base_minutes: int = 1) -> float:
jitter = random.uniform(0, 0.3 * base_minutes)
return base_minutes * (2 ** (attempt - 1)) + jitter
# Simulate retry schedule
for attempt in range(1, 7):
delay_min = compute_delay(attempt)
print(f"Retry {attempt}: wait {delay_min:.1f} minutes")
Idempotency means performing an operation multiple times produces the same result as doing it once. In webhooks, this is critical because the at-least-once delivery model means the same event might be delivered twice.
Every event carries a unique idempotency key in the payload:
{
"id": "evt_3QpLmN7XyZ",
"idempotency_key": "evt_3QpLmN7XyZ_1716000000",
"type": "payment_intent.succeeded",
"data": { ... }
}
The receiver stores seen keys in a cache or database. When a duplicate arrives, the receiver detects the known key and returns a 200 OK without processing the event again.
seen_keys = redis_cache()
def handle_webhook(payload: dict):
key = payload["idempotency_key"]
if seen_keys.exists(key):
return {"status": "duplicate", "existing_response": seen_keys.get(key)}
result = process_event(payload)
seen_keys.set(key, result, ttl=86400)
return result
Common gotcha: the TTL on the idempotency cache must exceed the total retry window. If retries span 32.7 hours, the cache TTL should be at least 48 hours.
Without signing, a malicious actor could forge webhook requests and trigger unintended actions in your system. Every webhook provider signs requests so the receiver can verify authenticity.
The sender computes a signature over the raw request body using a shared secret:
signature = HMAC-SHA256(webhook_secret, request_body)
The signature is sent in a header. Stripe’s format:
webhook-signature: t=1716000000,v1=5257a869e7ecebeda32affa62cd4fa7c27f6c7d5d5f6e5c5c5f6a7b8c9d0e1f2
The timestamp t prevents replay attacks. The receiver verifies:
t and v1 from the headerHMAC-SHA256(secret, f"{t}.{body}")v1 using constant-time comparisont is within an acceptable time window (e.g., 5 minutes)import hmac
import hashlib
import time
def verify_signature(
payload: bytes,
header: str,
secret: str,
tolerance_seconds: int = 300,
) -> bool:
params = {}
for part in header.split(","):
key, value = part.strip().split("=", 1)
params[key] = value
timestamp = int(params["t"])
if abs(time.time() - timestamp) > tolerance_seconds:
return False
signed_payload = f"{timestamp}.{payload.decode()}".encode()
expected = hmac.new(
secret.encode(), signed_payload, hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, params["v1"])
The sender signs the payload with a shared secret. The receiver recomputes the signature and compares. If they match, the payload is authentic. If not, the request is rejected.
A misbehaving client should not degrade service for others. Each client gets a rate limit defined at registration time (e.g., 100 requests per second). Workers use a token bucket or sliding window counter to throttle delivery.
import asyncio
from token_bucket import TokenBucket
client_limits = {
"client_a": TokenBucket(rate=100, capacity=100),
"client_b": TokenBucket(rate=10, capacity=10),
}
async def deliver(webhook_url: str, payload: dict):
bucket = client_limits.get(url_to_client(webhook_url))
if not bucket.consume():
await queue.reenqueue(payload, delay=1.0)
return
await http_post(webhook_url, payload)
If a client consistently exceeds their limit, excess events queue up. The admin can adjust the limit or pause delivery entirely.
The delivery service sits between event producers and client endpoints. It handles persistence, queuing, rate-limited delivery, retries, and observability.
A shared queue works for low volumes but creates a head-of-line blocking problem: when one client’s endpoint is slow, deliveries for all other clients wait behind it.
The fix: give each client its own queue (a separate Redis list, SQS queue, or RabbitMQ exchange). Workers poll from all queues in a round-robin or priority-based fashion.
class WebhookQueueManager:
def __init__(self, redis_client):
self.redis = redis_client
def enqueue(self, client_id: str, event: dict):
queue_key = f"webhook_queue:{client_id}"
self.redis.lpush(queue_key, json.dumps(event))
def dequeue_multi(self, client_ids: list[str], batch_size: int = 10):
events = []
for cid in client_ids:
for _ in range(batch_size):
event = self.redis.rpop(f"webhook_queue:{cid}")
if not event:
break
events.append((cid, json.loads(event)))
return events
A fixed pool of workers (e.g., 50 threads) continuously polls queues, sends HTTP POSTs, and evaluates responses. Workers mark events as delivered, re-enqueue for retry with delay, or move them to DLQ.
async def worker_loop():
while True:
events = queue.dequeue_multi(client_registry.all_ids())
for client_id, event in events:
url = client_registry.get_url(client_id)
try:
async with httpx.AsyncClient() as client:
resp = await client.post(url, json=event, timeout=10)
if resp.status_code in (200, 201, 204):
db.mark_delivered(event["id"])
elif 400 <= resp.status_code < 500:
db.mark_failed(event["id"]) # no retry
else:
queue.reenqueue_with_delay(event, compute_delay(event["attempt"]))
except (httpx.TimeoutException, httpx.NetworkError):
queue.reenqueue_with_delay(event, compute_delay(event["attempt"]))
After the maximum number of retries (typically 6), the event moves to a dead letter queue. A DLQ is a separate queue or database table that stores undeliverable events for manual inspection.
def move_to_dlq(event: dict):
dlq_key = f"webhook_dlq:{event['id']}"
redis.set(dlq_key, json.dumps({
**event,
"dlq_reason": "max_retries_exceeded",
"dlq_timestamp": time.time(),
"attempts": event.get("attempts", 0),
}))
notify_admin(event)
Events in the DLQ can be:
The admin panel shows every failed event with full delivery history. An operator can click “Replay” to re-enqueue the event with fresh delivery attempts. This is essential for handling transient infrastructure issues on the client side.
Webhooks fundamentally provide at-least-once delivery. The event will eventually reach the endpoint or land in the DLQ, but duplicates are possible.
At-least-once means:
Exactly-once is not achievable with HTTP-based webhooks over an unreliable network. The closest you can get is:
# Receiver-side deduplication
def handle_webhook(request):
event_id = request.json["id"]
with db.transaction():
if db.events.exists(event_id):
return {"status": "already_processed"}, 200
db.events.insert(event_id, request.json)
process_event(request.json)
return {"status": "ok"}, 200
A production webhook system needs comprehensive observability:
-- Sample query: delivery success rate per client
SELECT
client_url,
COUNT(*) AS total_events,
SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END) AS delivered,
ROUND(SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS success_rate
FROM webhook_deliveries
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY client_url
ORDER BY success_rate ASC;
Stripe’s webhook architecture exemplifies the patterns we have covered. Here is the complete architecture end to end:
Event Producer (Payment Service)
|
v
Event Service (validates, assigns ID, signs payload)
|
v
Event Store (PostgreSQL)
|
v
Per-Client Queue (Redis streams or RabbitMQ)
|
v
Delivery Workers (N processes, rate-limited per client)
|
v
Client Endpoint (HTTP POST with Stripe-Signature header)
|
+--> 200 OK: mark delivered, log success
|
+--> 5xx/Timeout: re-enqueue with backoff delay
|
+--> Max retries: move to Dead Letter Queue
|
v
Admin Panel (view, replay, inspect)
Building a webhook system from scratch requires these components:
Before deploying your webhook system, ask yourself:
Building a reliable webhook system is not complicated at the component level, but the edge cases around retries, duplicates, and misbehaving clients make the difference between a system that is trusted and one that is constantly on fire.