In December 2018, an expired certificate in Ericsson’s certificate management software caused over 32 million people across 11 countries to lose 4G and SMS service for hours. The remediation cost exceeded one billion dollars. In 2017, the Equifax breach investigators discovered that a monitoring device had been running with an expired SSL certificate for 19 months, creating a blind spot that helped hide 265 instances of unauthorized data access. Equifax had 324 expired certificates across its infrastructure.
SSL certificate outages are among the most common and most preventable categories of production incident. According to Keyfactor’s 2024 PKI and Digital Trust Report, organizations experience an average of three certificate-caused outages every two years. The average time to identify the root cause is 2.6 hours; remediation takes another 2.7 hours. At typical enterprise downtime costs, that is an expensive, avoidable problem.
Most explanations of certificate outages focus exclusively on certificate expiry. Expiry is one failure mode among five. Organizations that monitor only expiry are protected against only one of the five ways SSL certificates cause outages. This guide covers all five failure modes, how each presents, how to detect each, and the monitoring strategy that addresses all of them.
The Five SSL Certificate Outage Failure Modes
| Failure mode | What causes it | Who gets the error | Expiry monitoring alone prevents it? |
| Certificate expiry | Certificate passes its Not After date | All visitors and clients | Yes |
| Renewed but not deployed | Certificate was renewed at origin but old cert still served at CDN, load balancer, or edge | All visitors and clients (may appear to operators as unexpectedly expired) | No |
| Chain misconfiguration | Intermediate CA certificate missing, wrong, or outdated in the server bundle | Clients that do strict chain validation (all browsers; strict API clients) | No |
| Intermediate CA rotation | CA rotates or deprecates an intermediate; servers still presenting old chain | Clients that fail when chain contains expired intermediate | No |
| CA issuance outage | CA infrastructure is unavailable; automated renewal (ACME) cannot complete; certificates cannot be issued or renewed | Future renewals fail; affects all sites when their certificate expires after the outage window | No |
Failure Mode 1: Certificate Expiry
Certificate expiry is the most common and most visible failure. When a certificate passes its Not After date, browsers display full-page security warnings that block most visitors from proceeding. The site does not go down in the infrastructure sense: the server is running, the application is responding, DNS resolves correctly. But from the visitor’s perspective, the site is inaccessible.
The suddenness is a characteristic feature. A certificate that was valid at 11:59 PM is invalid at midnight. There is no degraded state, no partial warning. The transition is instantaneous and affects every client simultaneously.
Why it happens despite being entirely predictable: subdomain sprawl means certificates exist on servers and subdomains that no central inventory tracks. Renewal automation runs correctly for months until a configuration change, credential rotation, or infrastructure change causes it to silently fail. Alerts go to email addresses or Slack channels that are no longer monitored. Calendar reminders are dismissed as someone else’s responsibility. Let’s Encrypt and similar automated CAs produce 90-day certificates that require more frequent renewal operations, multiplying the number of opportunities for silent renewal failures.
The most dangerous variant of certificate expiry is the silent renewal failure. Automated renewal appears to be working: the job runs, the log shows no errors, the monitoring system reports the certificate as healthy. But the renewal has been failing for 60 days while alerting to a stale state. This pattern appeared in a documented August 2025 incident where certbot had been hitting Let’s Encrypt rate limits since July while the monitoring system still reported the previous successful renewal. The certificate expired before anyone noticed the renewal was failing.
Failure Mode 2: Renewed but Not Deployed
This failure mode catches infrastructure teams that believe they are covered because they have automated renewal. The certificate was renewed. The renewal job succeeded. The new certificate is on the origin server. Visitors still see certificate errors.
The gap: TLS termination does not always happen on the origin server. CDNs (Cloudflare, Fastly, Akamai, CloudFront) terminate TLS at their edge nodes and hold their own copy of the certificate. Load balancers (AWS ALB, Azure Application Gateway, HAProxy, Nginx in front of application servers) terminate TLS and hold their own certificate copy. When the certificate is renewed on the origin server, these termination points continue serving the old certificate until they are explicitly updated.
The failure presents identically to expiry: visitors see certificate expiry errors. The operator renews the certificate and checks the origin server, sees a valid certificate, and is confused. The certificate check on the origin server returns the new certificate. The external check of what visitors actually receive returns the old, expired one.
Monitoring the origin server’s certificate is not sufficient when TLS is terminated at a CDN or load balancer. The SSL Labs test (ssllabs.com/ssltest) and external monitoring tools check what clients actually receive, which is the CDN or load balancer’s certificate. An internal check of the origin server will not detect a stale certificate at the edge. External monitoring is mandatory in CDN-fronted architectures.
Failure Mode 3: Chain Misconfiguration
A certificate’s chain must include the leaf certificate and the intermediate CA certificate that signed it. Most publicly trusted certificates require the intermediate to be explicitly included in the server’s certificate bundle because the intermediate is not in browser trust stores directly. When the intermediate is missing, some clients fail validation.
The inconsistency is the diagnostic challenge: browsers implement different fallback behaviors. Chrome and Firefox attempt to download missing intermediates using the Authority Information Access (AIA) extension URL in the certificate. If the AIA fetch succeeds, the browser builds the chain without the server providing the intermediate. This means the site works for most users in most browsers, suggesting the configuration is fine. Curl without the AIA fetch fallback fails. Mobile applications that do not implement AIA fetching fail. Java applications using their own TLS stack fail. The site appears healthy in browser-based monitoring while breaking for API clients.
| # Check if the server sends the complete chain:
$ openssl s_client -connect yourdomain.com:443 -showcerts 2>/dev/null | grep -c ‘BEGIN CERTIFICATE’ # Should return 2 or 3. If 1: only the leaf cert is being served. # Missing intermediate will cause failures for strict clients.
# See exactly what certificates are in the chain: $ openssl s_client -connect yourdomain.com:443 -showcerts 2>/dev/null | openssl x509 -noout -text | grep -E ‘Subject:|Issuer:|Not After’
# Test the chain with SSL Labs: ssllabs.com/ssltest # Look for ‘Chain issues: Incomplete’ in the results. |
Failure Mode 4: Intermediate CA Rotation
Certificate Authorities periodically rotate their intermediate certificates. Intermediate certificates have their own expiry dates, and CAs replace them before they expire. When an intermediate is rotated, all leaf certificates signed by the old intermediate are now chained through an intermediate that is approaching or past its expiry.
Servers that bundle a static intermediate certificate file (copied at certificate install time and never updated) continue serving the old, expiring intermediate even after the CA has published a new one. As the old intermediate approaches its expiry, chain validation begins failing for strict clients, then for all clients once the intermediate expires.
This failure mode affects certificates that are still within their own validity period. The leaf certificate has not expired. The operators check the leaf certificate, see it is valid, and do not understand why clients are failing. The problem is in the intermediate bundled with the server configuration.
The Let’s Encrypt DST Root CA X3 expiry in September 2021 is the most prominent example. Let’s Encrypt’s own R3 intermediate was cross-signed by DST Root CA X3 and ISRG Root X1. When DST Root CA X3 expired, servers serving the chain through the DST cross-sign began failing for clients that followed the expired path rather than the ISRG path. Modern browsers handled the multi-path situation correctly; older OpenSSL-based clients and some non-browser applications failed. The fix was to update the certificate bundle to serve the chain through ISRG Root X1 only.
The CA/B Forum-approved reduction of maximum certificate validity to 47 days by March 2029 reduces the window of exposure for intermediate CA rotation issues: when leaf certificates renew frequently, the bundled intermediate is also refreshed with each renewal, keeping it current. At 90-day validity (current Let’s Encrypt), certificates renew often enough that stale intermediate bundles are replaced regularly. At multi-year validity, the intermediate bundle can become very stale before anyone notices.
Failure Mode 5: CA Issuance Outage
Certificate Authorities occasionally experience infrastructure problems that prevent certificate issuance. When this happens, automated renewal systems (Certbot, ACME clients, hosting provider AutoSSL) cannot obtain new certificates. Certificates that are due for renewal during the outage window cannot be renewed.
An important distinction: a CA issuance outage does not break currently-valid certificates. The TLS handshake between a visitor’s browser and a web server does not contact the CA. The certificate is embedded in the server’s TLS configuration. An ongoing CA outage has no effect on sites serving valid, non-expired certificates. What it breaks is the ability to renew certificates that are due to expire.
On May 8, 2026, Let’s Encrypt experienced a 2.5-hour issuance halt (18:37 to approximately 21:05 UTC) caused by cross-signed intermediates being issued without required Extended Key Usage fields. The postmortem confirmed that no currently-valid end-entity certificates were affected. Sites that could not renew during that 2.5-hour window simply retried after the service was restored. For sites with plenty of remaining validity at the time of the outage, there was no practical impact.
The risk scenario for CA issuance outages: certificates with very short remaining validity at the time of the outage. An ACME client that retries renewal every few hours will succeed after the outage resolves. An ACME client that only runs once daily at midnight, combined with a CA outage during that window, combined with a certificate expiring the next morning, can result in an expiry.
The transition to shorter certificate validity periods (47 days by 2029, with some CAs offering 7-day short-lived certificates now) increases the frequency of renewal operations, which increases the exposure to CA issuance outages and renewal automation failures. ACME Renewal Information (ARI), an IETF draft protocol implemented by Let’s Encrypt and others, addresses this by allowing the CA to suggest an optimal renewal window and notify clients of urgent renewals, distributing renewal load and reducing the impact of CA outages on certificate lifecycle management.
High-Profile SSL Certificate Outages and What They Teach
Real incidents illustrate how each failure mode presents in production and what the organizational impact looks like:
| Organization | Year | Failure mode | Impact | Cost / consequence |
| Ericsson | 2018 | Certificate expiry in certificate management software used to verify network software licenses | 32 million UK customers lost 4G and SMS. Similar outages in 11 countries. | Over $1 billion in remediation including legal settlements |
| Equifax | 2017 | Expired certificate on internal network monitoring device | Monitor offline for 19 months; 265 unauthorized data access events went undetected during the 2017 breach | $575 million FTC settlement; total costs exceed $1.4 billion |
| Microsoft WinGet CDN | 2023 | SSL/TLS certificate expiry on CDN endpoint | Users unable to install or upgrade software via Windows Package Manager | Service disruption affecting developers globally |
| LinkedIn subdomain | 2021 | Country subdomain SSL certificate expiry | Desktop users receiving SSL connection errors; login disruptions | Revenue and reputation impact from major platform disruption |
| US Government sites (80) | 2019 (during shutdown) | Mass certificate expiry during government shutdown when renewal staff were furloughed | Dozens of federal .gov sites inaccessible | Reputational damage; security risks during extended outage |
| Spotify / Megaphone | 2022 | Expired certificate on podcast CDN | Content distribution failures; listener engagement disrupted | Podcast platform degradation for hours |
Monitoring Strategy: Covering All Five Failure Modes
Expiry-only monitoring is the most common approach and the most incomplete. A complete monitoring strategy addresses all five failure modes:
What to monitor
- Primary domain and www: The obvious ones. Both must be monitored because separate certificates may be installed for each.
- All subdomains with public traffic: Marketing subdomains, campaign landing pages, API endpoints, partner portals, developer portals. Certificate Transparency logs are the most reliable way to discover subdomains you may not know exist: crt.sh allows searching by organization name to find all certificates issued to your domains.
- CDN and load balancer termination points: Monitor what clients actually receive, not what the origin server holds. External monitoring tools connect from outside your network and see the same certificate visitors see.
- Internal-facing endpoints: VPN portals, internal dashboards, admin interfaces, monitoring systems. Equifax’s expired monitoring certificate is the canonical example of why internal certificates matter as much as external ones.
- API endpoints: APIs used by mobile applications and partner integrations. Mobile applications do not implement the AIA fallback that browsers use; chain misconfigurations that browsers handle silently cause mobile API failures.
- Certificate chain completeness: Not just the expiry date of the leaf certificate but whether the intermediate is present and unexpired. SSL Labs and openssl s_client both reveal chain issues.
When to alert
Effective alert thresholds give teams time to respond before an outage while avoiding alert fatigue from overly early warnings. A tiered approach at multiple thresholds provides both early warning and urgency escalation:
- 60 days before expiry: informational alert to the team responsible for certificate renewal. Allows ample time for planned renewal.
- 30 days before expiry: action required. Renewal should be scheduled and assigned.
- 14 days before expiry: escalation. If renewal has not happened, escalate to management.
- 7 days before expiry: urgent. Renewal must happen immediately; escalate to on-call.
- 1 day before expiry: emergency. Wake up whoever needs to wake up.
- Chain validation failure: immediate alert regardless of expiry date. A chain issue is affecting clients now.
- Renewal automation failure: immediate alert when automated renewal jobs fail. The certificate has not expired yet, but the system that prevents expiry is broken.
| # Quick external certificate check from the command line:
$ echo | openssl s_client -connect yourdomain.com:443 -servername yourdomain.com 2>/dev/null | openssl x509 -noout -enddate # Returns: notAfter=Oct 15 12:00:00 2026 GMT
# Check how many days until expiry: $ echo | openssl s_client -connect yourdomain.com:443 -servername yourdomain.com 2>/dev/null | openssl x509 -noout -enddate | awk -F= ‘{print $2}’ | xargs -I{} date -d {} +%s | xargs -I{} expr {} – $(date +%s) | xargs -I{} expr {} / 86400 # Returns number of days until expiry
# Check certificate chain completeness (count returned certs): $ echo | openssl s_client -connect yourdomain.com:443 -showcerts 2>/dev/null | grep -c ‘BEGIN CERTIFICATE’ # Should be 2 or 3. If 1: intermediate is missing. |
The Changing Landscape: Shorter Validity and What It Means for Outage Prevention
Certificate validity is getting shorter. Let’s Encrypt has always issued 90-day certificates. The CA/B Forum approved a ballot to reduce the maximum public certificate validity to 47 days by March 2029, with further reductions in subsequent years. Some CAs already offer 7-day short-lived certificates as an option.
The rationale for shorter validity is sound: shorter certificates reduce the window of harm from a compromised certificate, make revocation less critical, and keep automation honest. An organization whose automation handles 90-day certificates reliably must do so six times per year per certificate. At 47 days, they must do so nearly eight times per year. At 7 days, fifty-two times per year. Silent automation failures surface faster because the failure produces an expired certificate sooner.
The operational consequence for outage prevention: automated renewal is not optional at shorter validity periods. Manual renewal processes that were marginally feasible with 1-year or 2-year certificates become impossible at 47 days and absurd at 7 days. Organizations that have not yet automated certificate renewal for all of their certificates will face increasing operational pressure as validity shortens. ACME is the standard protocol for automated renewal and is supported by all major public CAs.
The monitoring implication: the absolute number of days before expiry as a trigger threshold becomes less useful at shorter validity periods. A 14-day alert on a 90-day certificate is meaningful (the certificate is 84% through its life). A 14-day alert on a 47-day certificate triggers when the certificate is 70% through its life, which is relatively very early. Percentage-of-remaining-life thresholds may be more appropriate than fixed day counts for short-lived certificate environments.
Immediate Response When an SSL Certificate Outage Is Active
When a certificate outage is identified while it is actively affecting visitors, the response must be fast. The sequence depends on the failure mode:
| Situation | First action | Interim if fix takes time | Permanent fix |
| Certificate expired; you have access to renew it | Renew certificate immediately; deploy to all TLS termination points | None; renew and deploy is the fastest path | Automated renewal with pre-expiry monitoring |
| Certificate expired; you cannot renew immediately (CA outage, team unavailable) | Check if backup/previous cert can be temporarily reinstated; or revert to HTTP with HSTS disabled | Communicate status; most users cannot proceed past browser warning | Restore renewal capability; automated renewal |
| Chain misconfiguration causing failures for some clients | Download current intermediate bundle from CA; update server configuration to include it | For browser-only traffic: may be tolerable short-term as browsers use AIA fallback | Update server config with complete chain; verify with SSL Labs |
| Renewed but not deployed at CDN/load balancer | Manually update certificate at CDN or load balancer with the newly renewed certificate | Origin server is fine; only edge certificates need updating | Integrate edge certificate update into renewal automation |
After resolving any SSL certificate outage, run the SSL Labs test on the affected domain before declaring the incident resolved. SSL Labs checks the complete chain, certificate validity, HSTS configuration, and TLS protocol support from an external perspective. A passing SSL Labs result confirms what clients actually receive is correct, not just what the internal server reports.
Frequently Asked Questions
What is an SSL certificate outage?
An SSL certificate outage is any production incident caused by a problem with an SSL/TLS certificate that prevents clients from establishing trusted connections to a server. Certificate expiry is the most common cause, but outages also occur from missing or outdated intermediate certificates in the chain, certificates renewed at the origin server but not deployed to CDN or load balancer termination points, intermediate CA rotation by the Certificate Authority, and CA infrastructure outages that prevent certificate renewal. Each failure mode requires different prevention and remediation approaches.
How much does an SSL certificate outage cost?
Keyfactor’s 2024 PKI and Digital Trust Report found that organizations experience an average of three certificate-caused outages every two years, with an average of 2.6 hours to identify the root cause and 2.7 hours to remediate. At common enterprise downtime cost estimates of several thousand dollars per minute for revenue-generating services, a single outage of 5 hours can cost millions of dollars in direct revenue loss before accounting for customer trust damage, SLA penalties, and incident response labor. High-profile cases are more severe: the Ericsson 2018 certificate outage cost over one billion dollars in total remediation.
Can a CA outage cause an SSL certificate outage on my site?
A CA infrastructure outage prevents new certificate issuance and renewal but does not affect currently valid certificates. Your site will continue serving HTTPS normally during a CA outage as long as your certificate has not yet expired. The risk from a CA outage is if your certificate is due to expire during the outage window and your automated renewal cannot complete. The practical mitigation is to begin renewal well before expiry: Let’s Encrypt’s ACME clients typically attempt renewal starting at 30 days before expiry, giving a 30-day window of retry opportunities around any CA outage.
Why does my certificate show as valid on the server but clients get errors?
The most common cause is a TLS termination point that is not the origin server. CDNs, load balancers, and reverse proxies hold their own copy of the certificate and serve it to clients. If the certificate was renewed on the origin server but not updated at the CDN or load balancer, the origin shows a valid certificate while clients receive the old, expired one. Check what clients actually receive using an external tool (SSL Labs, or openssl s_client from an external machine) rather than checking the origin server’s certificate store directly.
Why does an expired certificate affect the entire site instantly?
Certificate expiry has a hard boundary: the certificate has a Not After field specifying exactly when it becomes invalid. At that moment, every browser and strict TLS client simultaneously receives a trust error when presented with the certificate. The server continues running normally; the HTTP application continues responding normally. But every new connection fails at the TLS handshake before any HTTP content is exchanged. This is why the outage appears instantaneous and total: all clients everywhere are affected simultaneously from the same moment.
