Back to Blog
Monitoring

CT Log Monitoring at Scale: Drinking from the Firehose Without Drowning

The practical engineering challenges of monitoring Certificate Transparency logs when you have hundreds of domains and millions of log entries per day.

CertGuard Team··8 min read

You set up CT monitoring. Now you're getting 4,000 alerts a day.

Every guide about Certificate Transparency monitoring ends the same way: "subscribe to CT logs for your domains and get notified about new certificates." Great advice. Terrible at scale.

If you manage five domains, sure, crt.sh works fine. Check it once a week, maybe set up a Google Alert-style notification. But the moment you're responsible for 50, 200, or a thousand domains across multiple business units, you hit a wall that nobody warns you about. The volume of legitimate certificate issuance alone will bury you.

Renewal cycles overlap. Dev teams spin up staging environments with their own certs. Marketing launches microsites. Acquisitions bring in domains you didn't even know existed. And every single one of those generates CT log entries that your monitoring dutifully flags.

The actual data volume nobody talks about

CT logs collectively process somewhere around 8 billion certificates. Google's Argon log alone handles millions of submissions weekly. When you're querying for your domains, you're not searching a small database. You're pattern-matching against a distributed, append-only data structure that grows by hundreds of thousands of entries every hour.

Most teams start with the crt.sh API. Quick and dirty:

curl -s "https://crt.sh/?q=%.example.com&output=json" | \
  jq '.[].common_name' | sort -u

Works great until you hit rate limits. Or until the response takes 45 seconds because your domain has 12,000 historical entries. Or until crt.sh goes down for maintenance, which happens more often than you'd expect.

The real problem isn't getting the data. It's processing it fast enough to matter.

Building a pipeline that actually scales

Forget polling crt.sh. For serious monitoring, you need to consume CT log entries directly. Google maintains a list of trusted logs, and each log exposes a simple REST API defined in RFC 6962. The key endpoints are get-sth (signed tree head, tells you the current log size) and get-entries (fetches actual certificate data).

Here's a basic Go worker that tails a CT log:

func tailLog(logURL string, startIndex int64) {
    for {
        sth, _ := getSTH(logURL)
        if sth.TreeSize <= startIndex {
            time.Sleep(30 * time.Second)
            continue
        }
        // fetch in batches of 256
        for i := startIndex; i < sth.TreeSize; i += 256 {
            end := min(i+255, sth.TreeSize-1)
            entries, _ := getEntries(logURL, i, end)
            for _, entry := range entries {
                cert := parseLeafInput(entry)
                if matchesDomains(cert, watchList) {
                    notify(cert)
                }
            }
        }
        startIndex = sth.TreeSize
    }
}

Simple enough. But you need to run this against every active CT log. As of early 2026, that's around 30 logs. Each with different performance characteristics, different batch size limits, different uptime guarantees.

One team I worked with tried running this as a single process. Worked fine for a month. Then one log started responding slowly, the goroutine backed up, memory usage spiked, and the whole thing OOM-killed at 2 AM. Classic.

Filtering signal from noise

Raw CT log matching gives you every certificate ever issued for your domains. That includes legitimate renewals, which for Let's Encrypt happen every 60-90 days per certificate. If you have 200 domains with wildcard and bare domain certs, you're looking at roughly 1,500 expected renewals per quarter. Every one of those triggers an alert if you're doing naive matching.

You need a baseline. Build a certificate inventory first, then compare new CT entries against it:

interface CertBaseline {
  domain: string
  expectedCA: string        // "Let's Encrypt" | "DigiCert" | etc
  expectedKeyType: string   // "RSA-2048" | "ECDSA-P256"
  renewalWindowDays: number // when renewal is expected
  lastSeenSerial: string
}

function isExpected(entry: CTEntry, baselines: CertBaseline[]): boolean {
  const match = baselines.find(b => 
    domainMatches(entry.cn, b.domain) &&
    entry.issuer === b.expectedCA
  )
  if (!match) return false  // unknown CA, flag it
  
  // check if we're in the renewal window
  const daysSinceLastSeen = daysBetween(match.lastSeenAt, now())
  return daysSinceLastSeen >= (90 - match.renewalWindowDays)
}

This cuts noise by about 85% in most environments. The remaining 15% is where it gets interesting: new subdomains, CA switches, dev environments using production domain names, that sort of thing.

The subdomain explosion problem

Wildcard monitoring sounds simple. Watch *.example.com and you're covered. Except CT logs don't issue certificates for wildcards the way you might think. A certificate for *.example.com shows up as exactly that in the CN or SAN field. But a certificate for api.staging.internal.example.com is a completely separate entry.

And teams create subdomains constantly. Every PR preview deployment, every feature branch environment, every A/B test variant. One fintech company I consulted for had over 6,000 unique subdomains, most of them ephemeral. Their CT monitoring was flagging 200+ "unknown" certificates per day. All legitimate. All useless noise.

The fix wasn't better filtering. It was integrating CT monitoring with their deployment pipeline. When a new environment spun up, it registered its expected certificate fingerprint with the monitoring system. When it tore down, the registration expired. Suddenly their false positive rate dropped to single digits.

Latency matters more than you think

CT logs have a Maximum Merge Delay (MMD), currently set at 24 hours. That means a certificate can be issued and used in the wild for up to 24 hours before it appears in logs. In practice, most logs incorporate entries within minutes, but the spec allows a full day.

So your monitoring pipeline can be perfect, and you'll still have a blind spot. An attacker who compromises a CA or uses a domain validation vulnerability has a window where they can use the fraudulent cert before your monitoring catches it. This is why CT monitoring should be one layer of defense, not the only one.

CAA records, DANE/TLSA records, and browser-level SCT enforcement all help close that gap. But none of them are a complete solution on their own either.

Storage and retention

If you're running your own CT monitoring infrastructure, you need to think about storage. A single year of CT entries for a large organization can easily hit 50GB of structured data. That's not a lot by modern standards, but it adds up, and you need it queryable.

PostgreSQL with proper indexing handles this fine up to about 100 million rows. Beyond that, you're looking at partitioning by date or moving to something like ClickHouse. One pattern that works well: keep the last 90 days in hot storage (Postgres), archive older entries to compressed Parquet files in object storage, query with DuckDB when you need historical analysis.

-- partition by month for manageable table sizes
CREATE TABLE ct_entries (
    id BIGSERIAL,
    logged_at TIMESTAMPTZ NOT NULL,
    domain TEXT NOT NULL,
    issuer TEXT NOT NULL,
    serial_hex TEXT NOT NULL,
    not_before TIMESTAMPTZ,
    not_after TIMESTAMPTZ,
    log_name TEXT NOT NULL,
    entry_index BIGINT NOT NULL
) PARTITION BY RANGE (logged_at);

-- index on domain for fast lookups
CREATE INDEX idx_ct_domain ON ct_entries (domain);
CREATE INDEX idx_ct_logged ON ct_entries (logged_at DESC);

What your alerting should actually look like

After filtering out expected renewals and known deployments, categorize what's left into severity tiers. Not every unexpected certificate is an emergency.

Critical (page someone): Certificate issued by an unexpected CA for a production domain. This could mean CA compromise or a domain validation attack. You want to know about this in minutes, not hours.

High (Slack notification, investigate same day): Certificate for a domain you own but issued to an unknown organization field. Could be a legitimate partner using your subdomain, could be something worse.

Medium (daily digest): New subdomain you haven't seen before getting a certificate. Usually just a dev team doing dev team things, but worth reviewing.

Low (weekly report): Expected renewals that happened outside the normal window, key type changes, minor CA switches within your approved list.

The biggest mistake teams make is treating everything as critical. That lasts about two weeks before someone adds a blanket ignore rule and the whole system becomes decoration.

Tools worth knowing about

Certstream gives you a real-time firehose of CT log entries via WebSocket. Great for prototyping, terrible for production because you're dependent on their infrastructure. They go down; your monitoring goes down.

Google's Certificate Transparency project on GitHub has reference implementations for log clients in Go. Battle-tested, but not exactly plug-and-play. Expect to spend a week wrapping it in your own alerting logic.

Facebook's ct-monitor was a solid option, but it hasn't been updated in years. Still works if you pin the dependencies, but you're on your own for bug fixes.

Or just use a managed service. Honestly, for most organizations, this is the right call. The engineering effort of running your own CT monitoring pipeline is significant, and unless you have specific compliance requirements that mandate self-hosted infrastructure, the ROI on building it yourself is questionable.

Where this is heading

Chrome is pushing for shorter certificate lifetimes. Apple already moved to 45-day maximum validity for public certificates starting in 2027. That means more renewals, more CT log entries, and more noise for your monitoring to handle. The teams that invested in proper filtering and baseline management will barely notice. Everyone else is going to have a rough year.

Start with a solid certificate inventory. Layer CT monitoring on top. Build your baselines before you build your alerts. And for the love of uptime, don't page your on-call engineer every time Let's Encrypt renews a staging cert.