Back to Blog
Ssl

Wildcard Certificates: Why DNS-01 Validation Keeps Breaking

DNS-01 validation looks simple on paper. Then you hit propagation delays, CAA records, and provider API rate limits. Here's what actually breaks in production.

CertGuard Team··7 min read

The DNS-01 Validation Tax

You want a wildcard cert. Let's Encrypt says "sure, just prove you control the domain via DNS-01". Sounds reasonable.

Then your automation fails at 2 AM because Cloudflare's API returned a 429. Or the TXT record propagated to 7 of their 8 validation servers but not the 8th. Or your CAA record has a typo that nobody noticed for three months because your regular certs use HTTP-01.

DNS-01 is the only game in town for wildcards. But it's got sharp edges that don't show up in the happy-path tutorials.

Propagation Is a Lie

DNS propagation isn't real. What you're actually dealing with is:

  • Authoritative server updates: Your DNS provider needs to write the TXT record to all their nameservers. Route53 says "60 seconds max" but I've seen 90+ in us-west-2 during an outage.
  • Let's Encrypt's query path: They don't query your authoritatives directly. They use public resolvers (Google, Cloudflare DNS) which cache. Your 60-second TTL means nothing if they queried 30 seconds ago.
  • Multiple validation attempts: Let's Encrypt hits you from different locations. One DC sees the record, another doesn't. Your renewal hangs in "pending" because 1 of 4 checks failed.

The "wait 2 minutes after creating the TXT record" advice works 95% of the time. The other 5% is your cert expiring during a holiday weekend.

// Bad: trust the API response
    await dnsProvider.createTXTRecord('_acme-challenge', token);
    await sleep(120000); // 2 minutes, should be enough right?
    await acmeClient.completeChallenge();

    // Better: poll until you can actually resolve it
    await dnsProvider.createTXTRecord('_acme-challenge', token);
    
    let resolved = false;
    for (let i = 0; i < 30; i++) {
      const records = await dns.resolveTxt(`_acme-challenge.${domain}`);
      if (records.flat().includes(token)) {
        resolved = true;
        break;
      }
      await sleep(10000); // 10 sec between checks
    }
    
    if (!resolved) {
      throw new Error('TXT record never propagated');
    }
    
    // Extra buffer for LE's resolvers to catch up
    await sleep(30000);
    await acmeClient.completeChallenge();

Yeah it's slower. But it doesn't fail silently when your DNS provider is having a bad day.

CAA Records: The Silent Killer

CAA records tell CAs which certificates they're allowed to issue for your domain. If you have them, they apply to wildcards too. But the validation is stricter.

For example.com with a wildcard, Let's Encrypt checks CAA records at:

  • example.com
  • *.example.com (yes, literally the wildcard)

Most teams set CAA on the apex domain and forget about it. Works fine for regular certs. Wildcard renewal fails with a cryptic "CAA record prevents issuance" error.

; This works for example.com and www.example.com
    example.com. CAA 0 issue "letsencrypt.org"

    ; But fails for *.example.com because there's no explicit wildcard policy
    ; You need:
    example.com. CAA 0 issue "letsencrypt.org"
    example.com. CAA 0 issuewild "letsencrypt.org"

The issuewild tag is specifically for wildcard certs. If you have CAA records but no issuewild, wildcards are implicitly blocked even if regular issuance is allowed.

I've debugged this exact scenario four times. Every time the team had CAA records from years ago that predated their wildcard usage. Nobody remembered they existed until the cert renewal cronjob started failing.

DNS Provider API Rate Limits

Cloudflare: 1200 requests per 5 minutes per API token. Route53: 5 requests per second per account. Azure DNS: 500 write operations per minute per zone.

Sounds like a lot until you're running cert-manager in a Kubernetes cluster with 40 Ingresses that all renew on the same day. Each renewal creates and deletes a TXT record. Two API calls per cert. 80 calls in a burst.

Cloudflare doesn't care. Route53 starts returning 429s on request 26.

// Naive: blast all renewals at once
    await Promise.all(domains.map(d => renewCert(d)));

    // Reality: stagger them or you'll hit rate limits
    for (const domain of domains) {
      try {
        await renewCert(domain);
      } catch (err) {
        if (err.code === 'RATE_LIMITED') {
          console.log(`Rate limited, waiting 60s...`);
          await sleep(60000);
          await renewCert(domain); // retry once
        } else {
          throw err;
        }
      }
      await sleep(5000); // 5 sec between renewals
    }

Or use a library that handles this. certbot has --dns-cloudflare-propagation-seconds for a reason.

Subdomain Wildcards and CNAME Hell

Want a cert for *.api.example.com? The ACME challenge goes to _acme-challenge.api.example.com.

But what if api.example.com is a CNAME to lb-prod-us-east.cloudprovider.com? You can't create a TXT record on a CNAME. DNS doesn't allow it.

Your options:

  1. Change the CNAME to an A record: Not great if you're using provider-managed load balancers with dynamic IPs.
  2. Use a CNAME for the challenge subdomain: Point _acme-challenge.api.example.com to _acme-challenge.example.com where you can manage TXT records. Let's Encrypt follows CNAMEs for challenges.
  3. Use a delegated validation subdomain: Some ACME clients support this. Messy but works.

The CNAME trick is underused. I've seen teams rewrite their entire DNS setup to avoid it when a single CNAME would've solved it.

; Instead of managing _acme-challenge.api.example.com
    ; (which is a CNAME and can't have TXT records)
    _acme-challenge.api.example.com. CNAME _acme-challenge.example.com.

    ; Now all challenges for *.api.example.com land at the apex
    ; where you can actually manage TXT records
    _acme-challenge.example.com. TXT "challenge-token-here"

Debugging: What Let's Encrypt Actually Sees

When validation fails, the error message is usually useless. "DNS problem: NXDOMAIN looking up TXT for _acme-challenge.example.com" could mean:

  • The record doesn't exist (you forgot to create it)
  • It exists but hasn't propagated yet
  • It propagated to your resolver but not theirs
  • Your DNS zone isn't configured correctly
  • There's a DNSSEC validation failure

Test what they see:

# Query the same public resolvers Let's Encrypt uses
    dig @8.8.8.8 _acme-challenge.example.com TXT
    dig @1.1.1.1 _acme-challenge.example.com TXT

    # Check authoritative servers directly
    dig @ns1.example.com _acme-challenge.example.com TXT

    # Trace the full resolution path
    dig +trace _acme-challenge.example.com TXT

If dig @8.8.8.8 shows the record but validation still fails, it's a caching issue. Wait longer. If it doesn't show up, your authoritative servers aren't serving it.

Multi-Region DNS and Anycast Shenanigans

Some DNS providers use anycast. Your API request to create a TXT record hits a server in Frankfurt. Let's Encrypt's validation query from Virginia hits a server in Ashburn. If replication between those isn't instant, validation fails.

Cloudflare's pretty good about this. Most requests replicate in under 10 seconds. Smaller providers? I've seen 2+ minutes.

There's no fix except waiting. But you can detect it:

# Query from multiple locations
    # Use a service like DNS Checker or run dig from different VPS regions
    # If some locations see the record and others don't, it's replication lag

    # Or just add a longer safety buffer
    await dnsProvider.createTXTRecord(...);
    await sleep(180000); // 3 minutes for slow providers

Feels wasteful when Cloudflare replicates in 10 seconds. But the alternative is random failures during renewals when you're nowhere near your terminal.

The Actual Checklist

Before you blame Let's Encrypt for your failed wildcard renewal:

  1. Check CAA records. Do you have issuewild? If not, add it.
  2. Verify the TXT record exists and contains the right token (copy-paste errors happen).
  3. Query public resolvers (8.8.8.8, 1.1.1.1), not just your local resolver.
  4. If it's a subdomain wildcard, make sure you're not hitting the CNAME restriction.
  5. Check your DNS provider's status page. Outages happen.
  6. Look at rate limits. Did you just try to renew 50 certs at once?
  7. Wait at least 2-3 minutes between creating the record and requesting validation. DNS is eventually consistent, emphasis on eventually.

DNS-01 isn't hard. But it's got more moving parts than HTTP-01, and every part is a potential failure point. Wildcard certs are convenient. The automation to keep them valid is not.