You Probably Don't Know How Many Certificates You Have

The spreadsheet is lying to you

Every organization I've worked with has a certificate inventory. Usually it's a Google Sheet, sometimes a Confluence page, occasionally a proper CMDB entry. And every single one of them is wrong.

Not slightly wrong. Dramatically wrong.

A fintech company I consulted for last year was confident they had 47 certificates in production. We ran a full discovery scan across their infrastructure and found 113. The "missing" 66 included load balancer certs that predated the current team, certificates on legacy services nobody wanted to touch, and a handful deployed by a contractor who left in 2022 without documenting anything. Three of those unknown certificates expired within the next two weeks.

Why manual tracking always drifts

Certificates multiply in ways that humans are terrible at tracking. A developer spins up a staging environment, grabs a Let's Encrypt cert, forgets about it. Someone configures a new subdomain for a marketing campaign that runs for six weeks. An integration partner requires mutual TLS, so a cert gets generated and installed on a single service. These all happen outside whatever process your team defined for "how we manage certificates."

The fundamental problem is that issuing a certificate is easy. Tracking it requires discipline across every person who touches infrastructure. That math never works out.

Start with network discovery, not documentation

Forget what your docs say you have. Scan for what's actually there.

# Scan your entire subnet for TLS services
nmap -sV --script ssl-cert -p 443,8443,8080,9443 10.0.0.0/16 -oX cert-scan.xml

# Quick parse to extract subjects and expiry dates
xmlstarlet sel -t -m "//script[@id='ssl-cert']"   -v "../../../address/@addr" -o " | "   -v "elem[@key='subject']/elem[@key='commonName']" -o " | "   -v "elem[@key='validity']/elem[@key='notAfter']" -n   cert-scan.xml

This catches everything listening on common TLS ports. But it misses services on non-standard ports, certificates used only for client auth, and anything behind a firewall you can't reach from your scanning host. So this is step one, not the whole answer.

For cloud environments, you need to query the source directly:

# AWS - pull every cert from ACM and IAM
aws acm list-certificates --query 'CertificateSummaryList[*].[DomainName,Status,NotAfter]' --output table
aws iam list-server-certificates --query 'ServerCertificateMetadataList[*].[ServerCertificateName,Expiration]' --output table

# GCP
gcloud certificate-manager certificates list --format="table(name,san_dns_names,expire_time)"

# Azure
az network application-gateway ssl-cert list --resource-group your-rg --gateway-name your-gw

Run these across every account. Every region. Including the ones "nobody uses anymore" because those are exactly where forgotten certs live.

Certificate Transparency logs are your backup

CT logs record every publicly trusted certificate issued for your domains. This is incredibly useful for catching certificates you didn't know about, whether issued by your team, a third party, or potentially an attacker.

# Query crt.sh for all certificates ever issued for your domain
curl -s "https://crt.sh/?q=%25.yourdomain.com&output=json" |   jq -r '.[] | select(.not_after > now) | [.common_name, .not_after, .issuer_name] | @tsv' |   sort -t$'\t' -k2

I've seen this surface certificates issued by CDN providers, email services, and SaaS platforms that the infrastructure team had zero visibility into. One company discovered their marketing team had independently set up Cloudflare for a campaign microsite, complete with its own certificate, pointing to an S3 bucket with customer data. Good times.

Build the inventory programmatically

Once you know what exists, you need a system that stays current without relying on humans remembering to update a spreadsheet. The approach that actually works combines three data sources: active scanning on a schedule, cloud API queries, and CT log monitoring.

// Simplified cert inventory aggregator
interface CertRecord {
  fingerprint: string;
  subject: string;
  sans: string[];
  issuer: string;
  notBefore: Date;
  notAfter: Date;
  source: 'scan' | 'cloud-api' | 'ct-log';
  lastSeen: Date;
  hosts: string[];
}

async function reconcileInventory(
  scanResults: CertRecord[],
  cloudResults: CertRecord[],
  ctResults: CertRecord[]
): Promise {
  const inventory = new Map();

  // Fingerprint is the dedup key
  for (const cert of [...scanResults, ...cloudResults, ...ctResults]) {
    const existing = inventory.get(cert.fingerprint);
    if (existing) {
      // Merge hosts and keep most recent lastSeen
      existing.hosts = [...new Set([...existing.hosts, ...cert.hosts])];
      existing.lastSeen = cert.lastSeen > existing.lastSeen
        ? cert.lastSeen
        : existing.lastSeen;
    } else {
      inventory.set(cert.fingerprint, cert);
    }
  }

  return Array.from(inventory.values());
}

The fingerprint-based dedup is critical. You'll see the same certificate from multiple sources, since a wildcard cert on three servers shows up as three scan results but it's one certificate. Without dedup, your inventory inflates and becomes useless for capacity planning.

The 90-day problem nobody talks about

Let's Encrypt pushed the industry toward shorter certificate lifetimes. 90 days seemed radical when they started. Now there's serious discussion about 47-day certificates, and Google has been vocal about wanting even shorter.

Shorter lifetimes are genuinely better for security. But they brutally expose inventory gaps. When your certificates lasted a year, you had months to notice a forgotten cert before it expired. With 90-day certs, that window shrinks to weeks. With 47-day certs, you basically need automation or you're dead.

The teams that struggle most are the ones with a mix. Some certs are auto-renewed through ACME. Others are commercial certs renewed annually through a manual process involving a procurement ticket, a CSR generation that someone does from memory, and an email chain with the CA that takes three business days. When you have both workflows, things fall through the gap between them constantly.

Alerting that actually works

Setting up "alert me 30 days before expiry" sounds reasonable until you realize that 30 days from now, you'll have 15 other alerts competing for attention and you'll snooze it. Twice. Then it's 3 days out and you're scrambling.

What works better is tiered alerting with escalation:

# Alert tiers (adjust to your org's response time)
60 days: Ticket created automatically, assigned to cert owner
30 days: Slack notification to the team channel
14 days: PagerDuty alert, low urgency
7 days:  PagerDuty alert, high urgency, page the on-call
1 day:   Everything. All channels. Wake people up.

But here's what most teams miss. The alert needs to include enough context to act on it. "Certificate for *.example.com expires in 14 days" is not actionable. You need: which hosts it's on, what service uses it, who owns that service, whether it's auto-renewed or manual, and if manual, the documented renewal procedure. Without that context, the person getting paged at 2 AM has to figure all of it out while half asleep.

Ownership is the hard part

Technical discovery is solved. Seriously, between network scanning, cloud APIs, and CT logs, finding certificates is not the challenge. The real problem is organizational.

Who owns each certificate? Not "the platform team" as a generic answer. Which specific person is responsible for renewing it? What happens when that person leaves? Is the renewal process documented somewhere other than their head?

Every certificate in your inventory needs an owner field. And you need a process for what happens when that owner changes roles or leaves the company. This sounds like boring organizational stuff because it is boring organizational stuff. It's also the reason certificates expire unexpectedly about 80% of the time.

The remaining 20% is automation failures that nobody noticed because the monitoring for the monitoring wasn't set up. Yeah.

Stop treating certificates as an infrastructure problem

Most organizations dump certificate management on the platform or infrastructure team. That works when you have 20 certificates and everyone knows where they are. It falls apart at 200 certificates spread across multiple cloud accounts, CDN providers, SaaS integrations, and that one physical server in a closet that runs the legacy billing system.

Certificate lifecycle management is a supply chain problem. You have suppliers (CAs), inventory (your certs), distribution points (every server, load balancer, and service that terminates TLS), and consumers (your users and integrations). Treat it like supply chain management and suddenly the need for automated tracking, clear ownership, and proactive replenishment makes a lot more sense than a shared spreadsheet.

Start with discovery. Build automated inventory. Assign owners. Set up tiered alerts with context. And stop trusting the spreadsheet.