Webhook Best Practices: A Developer's Production Guide

Webhooks look deceptively simple. You expose an HTTP endpoint, a provider POSTs an event, you process it. That works until the provider retries a duplicate, your server goes down during a deploy, or a bad actor replays a signed request from two hours ago.

This guide covers the failures that only show up in production. Most webhook documentation stops at "verify the signature." These practices go further.

Acknowledge First, Process Second

The fastest thing your endpoint should do is return a 200 OK. Not after processing the payload. Not after writing to the database. Immediately.

Most providers set a response timeout between 5 and 30 seconds. If you exceed it, they mark the delivery failed and retry. Now you're processing the same event twice while the first job is still running.

The correct pattern is verify, enqueue, respond:

// Node.js / Express
app.post('/webhooks/verid', express.raw({ type: 'application/json' }), async (req, res) => {
  // 1. Verify signature (fast, in-memory)
  const isValid = verifySignature(req.headers['verid-signature'], req.body, process.env.WEBHOOK_SECRET);
  if (!isValid) return res.status(401).send('Unauthorized');

  // 2. Enqueue raw payload for background processing
  await queue.push({ body: req.body.toString(), headers: req.headers });

  // 3. Acknowledge immediately
  res.status(200).send('OK');
});

Your queue worker handles the actual business logic. If it fails, you retry the worker job, not the entire HTTP round-trip with an external provider.

Verify Every Signature, Every Time

An unsigned or unverified webhook is just an unauthenticated POST from the internet. Anyone can send one.

HMAC-SHA256 is the standard. The provider signs the payload with a shared secret. You recompute the signature on your end and compare. If they match, the payload is authentic and untampered.

Verid uses the same header format as Stripe and Svix: Verid-Signature: t=<timestamp>,v1=<signature>. The timestamp is included so you can reject replayed requests.

const crypto = require('crypto');

function verifySignature(header, rawBody, secret) {
  if (!header) return false;

  const parts = Object.fromEntries(
    header.split(',').map(p => p.split('='))
  );

  const timestamp = parts.t;
  const signature = parts.v1;

  // Reject if timestamp is older than 5 minutes
  const age = Math.floor(Date.now() / 1000) - parseInt(timestamp, 10);
  if (age > 300) return false;

  const expected = crypto
    .createHmac('sha256', secret)
    .update(`${timestamp}.${rawBody}`)
    .digest('hex');

  // Constant-time comparison prevents timing attacks
  return crypto.timingSafeEqual(
    Buffer.from(expected, 'hex'),
    Buffer.from(signature, 'hex')
  );
}

Two things matter here that many guides skip. First, use crypto.timingSafeEqual instead of ===. String comparison short-circuits on the first mismatched character, which leaks timing information an attacker can use. Second, the 5-minute timestamp window closes replay attacks. A valid request captured and resent later will be rejected.

Use express.raw() or equivalent to get the raw request body before any JSON parsing. Parsing mutates the body and the recomputed HMAC won't match.

Design for Idempotency from Day One

Every webhook provider retries on failure. Stripe retries for 72 hours. GitHub retries for 3 days. Verid retries 6 times across nearly 4 hours. That means your handler will sometimes process the same event more than once, especially after restarts or deploy windows.

If your handler is not idempotent, you'll double-charge customers, create duplicate records, or fire notifications twice.

The pattern is to deduplicate on the delivery ID before doing any work:

async function processWebhookJob(payload) {
  const body = JSON.parse(payload.body);
  const deliveryId = body.id; // e.g. "del_01H..."

  // Check if already processed
  const alreadyProcessed = await redis.get(`webhook:processed:${deliveryId}`);
  if (alreadyProcessed) {
    console.log(`Skipping duplicate delivery ${deliveryId}`);
    return;
  }

  // Do the work
  await handleChangeEvent(body);

  // Mark as processed with a TTL slightly longer than the provider's retry window
  await redis.set(`webhook:processed:${deliveryId}`, '1', 'EX', 86400);
}

The TTL on the dedup key should exceed the provider's retry window. If they retry for 72 hours, keep the key for at least 72 hours plus some buffer.

Build a Retry Strategy That Doesn't Thrash

If you send webhooks outbound, you need a retry strategy that is kind to recipients. Hammering a struggling server every 5 seconds is the wrong approach. Use exponential backoff with jitter.

Attempt	Base delay	With jitter (±20%)
1	Immediate	0s
2	5 minutes	4–6 minutes
3	15 minutes	12–18 minutes
4	30 minutes	24–36 minutes
5	1 hour	48–72 minutes
6	2 hours	96–144 minutes

Jitter prevents thundering herd: if dozens of webhooks fail at the same time, staggered retries avoid hammering the recipient endpoint in synchronized waves.

Verid's built-in delivery system follows this pattern. When you use Verid as a change detection service, the 6-attempt retry schedule is already handled so you don't need to build it yourself.

After exhausting retries, move failed deliveries to a dead-letter queue (DLQ) instead of discarding them silently. The DLQ gives you an audit trail and the ability to replay events after you fix the underlying issue.

Keep Your Endpoint Fast

Your webhook endpoint has one job: receive the payload, validate it, and hand it off. Everything else is noise.

Common things that make endpoints slow:

Synchronous database writes before responding
Calling third-party APIs inline during the request
Parsing and transforming large payloads before acknowledging

The fix is always a queue. Redis queues (BullMQ), database-backed queues (pg-boss), or managed queues (SQS) all work. The pattern is the same regardless of implementation.

// Using BullMQ (Node.js)
import { Queue } from 'bullmq';

const webhookQueue = new Queue('webhook-events', {
  connection: { host: 'localhost', port: 6379 }
});

// In your endpoint handler
await webhookQueue.add('process', {
  deliveryId: body.id,
  payload: body
}, {
  attempts: 3,
  backoff: { type: 'exponential', delay: 5000 }
});

One thing people miss: even the signature verification step should use the raw body buffer, not the parsed result. Make sure your framework isn't parsing JSON before your middleware runs.

Handle Ordering and Out-of-Sequence Events

Webhook providers generally do not guarantee delivery order. If two events fire close together, retry delays can flip their order. You might receive an updated event before the corresponding created, or process a change payload that refers to a state that was already superseded.

A few rules:

Include a sequence field in your state. If the payload carries a timestamp or sequence number, store it. Before writing, check that the incoming event is newer than what you have.

Treat deletions with care. A deleted event arriving after a created for the same resource ID will break your state. Check whether the resource exists before acting.

For Verid payloads, the fired_at timestamp and the before/after diff give you enough context to detect stale events. If your stored version already shows v19.0.0 and you receive a payload saying it changed from v18.3.1 to v19.0.0, skip it.

Monitor Delivery, Not Just Your App

Most application monitoring watches your own services. Webhook delivery adds an external dependency: the provider's delivery pipeline. You need visibility into that too.

Track at minimum:

Metric	Why it matters
Delivery success rate	Catch when a specific integration starts failing
Time-to-delivery (p50/p95)	Detect latency spikes before they affect SLAs
Retry rate	High retry rates indicate endpoint instability
DLQ depth	Non-zero depth means events are being lost
Signature failure rate	Spike here usually means a secret rotation issue

Instrument your queue worker, not just your HTTP endpoint. The endpoint could return 200 consistently while workers are silently crashing. Tools like Datadog, Grafana, or Sentry all have good primitives. What matters is getting paged when delivery degrades, not just when uptime checks fail.

Security Checklist {#security-checklist}

Control	Requirement
Transport	HTTPS only. Redirect HTTP to HTTPS. No self-signed certs in production.
Signature verification	HMAC-SHA256 on every request. Constant-time comparison.
Timestamp validation	Reject requests older than 5 minutes.
Secret storage	Environment variable or secrets manager. Never hardcode.
Secret rotation	Support dual-secret validation during rotation window.
IP allowlisting	Restrict to provider's published IP ranges where available.
Rate limiting	Apply rate limits on the endpoint even after signature checks.
Payload logging	Mask or omit sensitive field values in logs.
DLQ monitoring	Alert on non-zero DLQ depth.
Schema validation	Reject payloads that don't match your expected structure.

Common Mistakes {#common-mistakes}

Processing before acknowledging. Providers think the delivery failed and retry. You process the same event twice while the first handler is still running.

Using === for HMAC comparison. This leaks timing information. Always use crypto.timingSafeEqual.

Parsing JSON before signature verification. JSON parsing is not round-trip safe. The signature was computed against the raw bytes. Verify first.

Ignoring the DLQ. Failed deliveries represent real business events. Silence is not success.

Storing secrets in committed .env files. Use a secrets manager or CI secret injection.

Returning non-2xx for business logic failures. Return 200 and handle the failure internally. Returning 500 triggers retries for something that won't succeed on retry either.

Skipping the timestamp check. A valid signed request from 6 hours ago is still a replay attack vector.

FAQs

What HTTP status code should a webhook endpoint return?

Return 200 OK (or 202 Accepted) immediately after validating the signature and enqueuing the payload. Reserve non-2xx responses for actual authentication failures (401) or malformed requests (400). Returning 5xx triggers retries; if your processing logic is broken, retries won't fix it and you'll exhaust the provider's retry budget unnecessarily.

How do I handle webhook secret rotation without downtime?

Support two active secrets simultaneously during the rotation window. Try verifying with the new secret first. If that fails, try the old one. Once all in-flight deliveries from before the rotation are past the provider's retry window, deactivate the old secret. Verid's HMAC format includes a version prefix (v1=) which makes extending this to multi-version support straightforward.

What is a webhook dead-letter queue and do I need one?

A dead-letter queue (DLQ) holds delivery attempts that failed every retry. Without one, those events are silently discarded. With one, you have an audit trail and the ability to replay them after fixing whatever caused the failures. If you're processing anything consequential (orders, alerts, state changes) you need a DLQ. A simple approach is a database table with the raw payload, error message, and timestamp. A more robust option is a managed queue service with native DLQ support.

How should I test webhook handling locally?

Use a tool like ngrok or Cloudflare Tunnel to expose your local port to the internet. Most providers, including Verid, let you set any HTTPS URL as your webhook destination. Tunnel tools give you a public URL that forwards to localhost, so you can test real payloads against your actual handler code without deploying. Combine this with the provider's "redeliver" feature (available in most dashboards) to replay past events against your updated handler.