Webhook Best Practices: A Developer's Production Guide
Webhooks look deceptively simple. You expose an HTTP endpoint, a provider POSTs an event, you process it. That works until the provider retries a duplicate, your server goes down during a deploy, or a bad actor replays a signed request from two hours ago.
This guide covers the failures that only show up in production. Most webhook documentation stops at "verify the signature." These practices go further.
Acknowledge First, Process Second
The fastest thing your endpoint should do is return a 200 OK. Not after processing the payload. Not after writing to the database. Immediately.
Most providers set a response timeout between 5 and 30 seconds. If you exceed it, they mark the delivery failed and retry. Now you're processing the same event twice while the first job is still running.
The correct pattern is verify, enqueue, respond:
// Node.js / Express
app.post('/webhooks/verid', express.raw({ type: 'application/json' }), async (req, res) => {
// 1. Verify signature (fast, in-memory)
const isValid = verifySignature(req.headers['verid-signature'], req.body, process.env.WEBHOOK_SECRET);
if (!isValid) return res.status(401).send('Unauthorized');
// 2. Enqueue raw payload for background processing
await queue.push({ body: req.body.toString(), headers: req.headers });
// 3. Acknowledge immediately
res.status(200).send('OK');
});Your queue worker handles the actual business logic. If it fails, you retry the worker job, not the entire HTTP round-trip with an external provider.
Verify Every Signature, Every Time
An unsigned or unverified webhook is just an unauthenticated POST from the internet. Anyone can send one.
HMAC-SHA256 is the standard. The provider signs the payload with a shared secret. You recompute the signature on your end and compare. If they match, the payload is authentic and untampered.
Verid uses the same header format as Stripe and Svix: Verid-Signature: t=<timestamp>,v1=<signature>. The timestamp is included so you can reject replayed requests.
const crypto = require('crypto');
function verifySignature(header, rawBody, secret) {
if (!header) return false;
const parts = Object.fromEntries(
header.split(',').map(p => p.split('='))
);
const timestamp = parts.t;
const signature = parts.v1;
// Reject if timestamp is older than 5 minutes
const age = Math.floor(Date.now() / 1000) - parseInt(timestamp, 10);
if (age > 300) return false;
const expected = crypto
.createHmac('sha256', secret)
.update(`${timestamp}.${rawBody}`)
.digest('hex');
// Constant-time comparison prevents timing attacks
return crypto.timingSafeEqual(
Buffer.from(expected, 'hex'),
Buffer.from(signature, 'hex')
);
}Two things matter here that many guides skip. First, use crypto.timingSafeEqual instead of ===. String comparison short-circuits on the first mismatched character, which leaks timing information an attacker can use. Second, the 5-minute timestamp window closes replay attacks. A valid request captured and resent later will be rejected.
Use express.raw() or equivalent to get the raw request body before any JSON parsing. Parsing mutates the body and the recomputed HMAC won't match.
Design for Idempotency from Day One
Every webhook provider retries on failure. Stripe retries for 72 hours. GitHub retries for 3 days. Verid retries 6 times across nearly 4 hours. That means your handler will sometimes process the same event more than once, especially after restarts or deploy windows.
If your handler is not idempotent, you'll double-charge customers, create duplicate records, or fire notifications twice.
The pattern is to deduplicate on the delivery ID before doing any work:
async function processWebhookJob(payload) {
const body = JSON.parse(payload.body);
const deliveryId = body.id; // e.g. "del_01H..."
// Check if already processed
const alreadyProcessed = await redis.get(`webhook:processed:${deliveryId}`);
if (alreadyProcessed) {
console.log(`Skipping duplicate delivery ${deliveryId}`);
return;
}
// Do the work
await handleChangeEvent(body);
// Mark as processed with a TTL slightly longer than the provider's retry window
await redis.set(`webhook:processed:${deliveryId}`, '1', 'EX', 86400);
}The TTL on the dedup key should exceed the provider's retry window. If they retry for 72 hours, keep the key for at least 72 hours plus some buffer.
Build a Retry Strategy That Doesn't Thrash
If you send webhooks outbound, you need a retry strategy that is kind to recipients. Hammering a struggling server every 5 seconds is the wrong approach. Use exponential backoff with jitter.
| Attempt | Base delay | With jitter (±20%) |
|---|---|---|
| 1 | Immediate | 0s |
| 2 | 5 minutes | 4–6 minutes |
| 3 | 15 minutes | 12–18 minutes |
| 4 | 30 minutes | 24–36 minutes |
| 5 | 1 hour | 48–72 minutes |
| 6 | 2 hours | 96–144 minutes |
Jitter prevents thundering herd: if dozens of webhooks fail at the same time, staggered retries avoid hammering the recipient endpoint in synchronized waves.
Verid's built-in delivery system follows this pattern. When you use Verid as a change detection service, the 6-attempt retry schedule is already handled so you don't need to build it yourself.
After exhausting retries, move failed deliveries to a dead-letter queue (DLQ) instead of discarding them silently. The DLQ gives you an audit trail and the ability to replay events after you fix the underlying issue.
Keep Your Endpoint Fast
Your webhook endpoint has one job: receive the payload, validate it, and hand it off. Everything else is noise.
Common things that make endpoints slow:
- Synchronous database writes before responding
- Calling third-party APIs inline during the request
- Parsing and transforming large payloads before acknowledging
The fix is always a queue. Redis queues (BullMQ), database-backed queues (pg-boss), or managed queues (SQS) all work. The pattern is the same regardless of implementation.
// Using BullMQ (Node.js)
import { Queue } from 'bullmq';
const webhookQueue = new Queue('webhook-events', {
connection: { host: 'localhost', port: 6379 }
});
// In your endpoint handler
await webhookQueue.add('process', {
deliveryId: body.id,
payload: body
}, {
attempts: 3,
backoff: { type: 'exponential', delay: 5000 }
});One thing people miss: even the signature verification step should use the raw body buffer, not the parsed result. Make sure your framework isn't parsing JSON before your middleware runs.
Handle Ordering and Out-of-Sequence Events
Webhook providers generally do not guarantee delivery order. If two events fire close together, retry delays can flip their order. You might receive an updated event before the corresponding created, or process a change payload that refers to a state that was already superseded.
A few rules:
Include a sequence field in your state. If the payload carries a timestamp or sequence number, store it. Before writing, check that the incoming event is newer than what you have.
Treat deletions with care. A deleted event arriving after a created for the same resource ID will break your state. Check whether the resource exists before acting.
For Verid payloads, the fired_at timestamp and the before/after diff give you enough context to detect stale events. If your stored version already shows v19.0.0 and you receive a payload saying it changed from v18.3.1 to v19.0.0, skip it.
Monitor Delivery, Not Just Your App
Most application monitoring watches your own services. Webhook delivery adds an external dependency: the provider's delivery pipeline. You need visibility into that too.
Track at minimum:
| Metric | Why it matters |
|---|---|
| Delivery success rate | Catch when a specific integration starts failing |
| Time-to-delivery (p50/p95) | Detect latency spikes before they affect SLAs |
| Retry rate | High retry rates indicate endpoint instability |
| DLQ depth | Non-zero depth means events are being lost |
| Signature failure rate | Spike here usually means a secret rotation issue |
Instrument your queue worker, not just your HTTP endpoint. The endpoint could return 200 consistently while workers are silently crashing. Tools like Datadog, Grafana, or Sentry all have good primitives. What matters is getting paged when delivery degrades, not just when uptime checks fail.
Security Checklist {#security-checklist}
| Control | Requirement |
|---|---|
| Transport | HTTPS only. Redirect HTTP to HTTPS. No self-signed certs in production. |
| Signature verification | HMAC-SHA256 on every request. Constant-time comparison. |
| Timestamp validation | Reject requests older than 5 minutes. |
| Secret storage | Environment variable or secrets manager. Never hardcode. |
| Secret rotation | Support dual-secret validation during rotation window. |
| IP allowlisting | Restrict to provider's published IP ranges where available. |
| Rate limiting | Apply rate limits on the endpoint even after signature checks. |
| Payload logging | Mask or omit sensitive field values in logs. |
| DLQ monitoring | Alert on non-zero DLQ depth. |
| Schema validation | Reject payloads that don't match your expected structure. |
Common Mistakes {#common-mistakes}
Processing before acknowledging. Providers think the delivery failed and retry. You process the same event twice while the first handler is still running.
Using === for HMAC comparison. This leaks timing information. Always use crypto.timingSafeEqual.
Parsing JSON before signature verification. JSON parsing is not round-trip safe. The signature was computed against the raw bytes. Verify first.
Ignoring the DLQ. Failed deliveries represent real business events. Silence is not success.
Storing secrets in committed .env files. Use a secrets manager or CI secret injection.
Returning non-2xx for business logic failures. Return 200 and handle the failure internally. Returning 500 triggers retries for something that won't succeed on retry either.
Skipping the timestamp check. A valid signed request from 6 hours ago is still a replay attack vector.
FAQs
What HTTP status code should a webhook endpoint return?
Return 200 OK (or 202 Accepted) immediately after validating the signature and enqueuing the payload. Reserve non-2xx responses for actual authentication failures (401) or malformed requests (400). Returning 5xx triggers retries; if your processing logic is broken, retries won't fix it and you'll exhaust the provider's retry budget unnecessarily.
How do I handle webhook secret rotation without downtime?
Support two active secrets simultaneously during the rotation window. Try verifying with the new secret first. If that fails, try the old one. Once all in-flight deliveries from before the rotation are past the provider's retry window, deactivate the old secret. Verid's HMAC format includes a version prefix (v1=) which makes extending this to multi-version support straightforward.
What is a webhook dead-letter queue and do I need one?
A dead-letter queue (DLQ) holds delivery attempts that failed every retry. Without one, those events are silently discarded. With one, you have an audit trail and the ability to replay them after fixing whatever caused the failures. If you're processing anything consequential (orders, alerts, state changes) you need a DLQ. A simple approach is a database table with the raw payload, error message, and timestamp. A more robust option is a managed queue service with native DLQ support.
How should I test webhook handling locally?
Use a tool like ngrok or Cloudflare Tunnel to expose your local port to the internet. Most providers, including Verid, let you set any HTTPS URL as your webhook destination. Tunnel tools give you a public URL that forwards to localhost, so you can test real payloads against your actual handler code without deploying. Combine this with the provider's "redeliver" feature (available in most dashboards) to replay past events against your updated handler.
Get a signed webhook when this page changes
Point Verid at any URL and get an HMAC-signed webhook on the change you care about. 5 monitors free, no credit card.
Related posts
Google Alerts Alternative for Developers: Structured Monitoring with Webhooks
Google Alerts has no API, no webhooks, and no structured output. Here's what developers use instead to monitor URLs programmatically.
developer toolsPredicate-Based Alerting: Stop Getting Spammed by Your Monitoring Tool
Alert fatigue is a monitoring tool bug. Verid's predicate system fires only when a change meets a condition — price drop, regex match, or threshold crossed.
competitor monitoringMonitor Competitor Pricing Pages with Webhooks (Step-by-Step)
Set up a webhook receiver that fires on real price changes: verified payload, currency parsing, noise filtering, and routing to Slack or a repricing engine.
JSONHow to Monitor a JSON API for Changes and Trigger Webhooks Automatically
Learn how to detect JSON API field changes, define smart predicates, and fire signed webhooks automatically without writing a single polling loop.
