Friday, 5:30 PM. I’m leaving the office, phone rings. Client’s CTO: “Our order confirmations haven’t been delivered for three hours. It’s Black Friday. We have a problem.”

Turns out their bounce rate jumped to 15% (usually it was 1%). Someone had added 50,000 old addresses to the database “because it would be a waste not to use them.” Gmail reacted immediately — blocked the entire domain.

If they had monitoring with alerts, they would have known within 15 minutes, not 3 hours. This story cost them around €50,000 in lost orders.

“Email sent” is not the same as “email delivered”

As DevOps or SRE, you know this perfectly well. But does your team see what happens to messages after they leave the server?

Most companies learn about deliverability problems from… customers. “I didn’t get my invoice.” “Where’s my confirmation?” “Check your spam folder.” This isn’t monitoring. This is firefighting.

What should we actually track?

Let’s start with the basics. You should know these metrics by heart:

Delivery Rate — percentage of messages accepted by the recipient server. Healthy level is above 98%. If it drops below 95%, something’s wrong.

Bounce Rate — percentage of rejected messages. Should be under 2%. Above 5% is a red alert.

Spam Complaint Rate — percentage of recipients reporting spam. Maximum 0.1%. Above 0.3% and Gmail starts looking at you suspiciously.

There are a few more metrics for advanced users:

  • Time to Delivery — how many seconds from send to delivery. Target: under 10 seconds for 95% of messages.
  • Inbox Placement Rate — percentage of messages landing in inbox, not spam.
  • DMARC Pass Rate — should be above 99%.

Bounces are not created equal

This is important because many people don’t understand this.

Hard bounce is permanent rejection. Address doesn’t exist, domain doesn’t exist, mailbox permanently blocked. Code 550. Action? Remove from list immediately. Every subsequent send attempt to this address damages your reputation.

Soft bounce is a temporary problem. Mailbox full (code 452), server temporarily unavailable, message too large. Action? Retry after some time. But if it still doesn’t work after 3-5 attempts — treat it as a hard bounce.

To be clear, SMTP codes work like this:

  • 4xx are temporary errors (you can retry)
  • 5xx are permanent errors (don’t retry)

External tools — free and paid

Google Postmaster Tools (postmaster.google.com) — mandatory if you have many Gmail recipients. Shows domain reputation, spam rate, authentication errors. Free, but data appears with ~24h delay.

Microsoft SNDS — same thing for Outlook and Hotmail. A bit less intuitive, but free.

MXToolbox — checks blacklists (100+), DNS, SPF/DKIM/DMARC. Free version for manual checks, paid with automatic monitoring and alerts.

Building your own dashboard

Okay, here we get into technical specifics.

The architecture is simple:

MailingAPI → Webhooks → Event Processor → TimescaleDB → Grafana
                              ↓
                         Alert Manager → Slack/PagerDuty

Webhook handler in Python might look like this:

from fastapi import FastAPI, Request
from prometheus_client import Counter, Histogram

app = FastAPI()

email_events = Counter(
    'email_events_total',
    'Total email events',
    ['event_type', 'domain']
)

@app.post("/webhooks/mailingapi")
async def handle_webhook(request: Request):
    event = await request.json()

    email_events.labels(
        event_type=event["type"],
        domain=event["from_domain"]
    ).inc()

    match event["type"]:
        case "bounced":
            if event["bounce_type"] == "hard":
                await remove_from_list(event["recipient"])
            await alert_if_spike(event)

        case "complained":
            # Spam complaint — always react immediately
            await alert_immediately(f"Spam complaint: {event['recipient']}")
            await add_to_suppression_list(event["recipient"])

    return {"status": "ok"}

The key is the alert_if_spike function — it checks if bounce rate in the last 15 minutes exceeded the threshold. If so, it sends an alert.

In Grafana, the PromQL query for delivery rate looks like this:

sum(rate(email_events_total{event_type="delivered"}[1h]))
/
sum(rate(email_events_total{event_type=~"delivered|bounced"}[1h]))
* 100

What to alert immediately vs. track in trends

There are things that require immediate reaction (read: wake you up at night):

  • Bounce rate >10% for 15 minutes — something went very wrong
  • Any spam complaint — someone marked your email as spam
  • Blacklist detection — you need to start delisting
  • Delivery rate <90% — check your infrastructure

And there are things worth observing in trends (daily Slack report):

  • Average bounce rate — has it been rising for several days in a row?
  • Open rate — didn’t it suddenly drop by 15%+?
  • Unsubscribe rate — is there a spike?

How we do it at MailingAPI

Honestly? We built all this so you don’t have to.

The dashboard shows delivery rate, bounce rate and complaints in real-time. Charts per domain. Details of every message.

Webhooks send events for every status: sent, delivered, bounced, complained, opened, clicked. With automatic retry and signature verification.

Alerts? Email when bounce rate exceeds threshold. Notification on blacklist hit. Info about authentication errors.

You can also export data via API:

curl -H "Authorization: Bearer $API_KEY" \
  "https://api.mailingapi.com/v1/stats?domain=example.com&period=7d"

My monitoring checklist

Daily (automated):

  • Delivery rate >98%
  • Bounce rate <2%
  • Zero spam complaints
  • Zero new blacklist hits

Weekly (review):

  • Bounce rate trend stable
  • Open rate no anomalies
  • Check reputation in Postmaster Tools

Monthly (audit):

  • Review suppression list
  • Verify DNS
  • End-to-end test via mail-tester.com

Final thoughts

Deliverability monitoring isn’t a luxury. It’s a necessity. The difference between “you’ll know in 15 minutes” and “you’ll learn from a customer after 3 hours” is often the difference between a small problem and a catastrophe.

If you don’t want to build all this yourself — create a free account and start monitoring deliverability in minutes.