Why cloudflare down disrupted major sites Nov 18

cloudflare down on Nov 18 disrupted major services globally; postmortem blames a Bot Management file and confirms full restoration by 17:06 UTC.

Deden Sembada19 Nov 2025

Cloudflare said its network suffered a major outage on November 18, 2025, when a malformed Bot Management feature file caused software limits to be exceeded, disrupting traffic for millions of websites worldwide and affecting services like ChatGPT, Spotify and YouTube while engineers reverted the file to restore service. 'cloudflare down' was not an attack; it resulted from a permissions change that created an oversized feature list in bot controls, and the company said services were fully restored by 17:06 UTC after a rollback. Experts linked the episode to dependency risks among major network providers, noting worldwide impact.

What happened and why

Investigators traced the outage to a recent database permission change that allowed an unusually large feature file to be created inside Cloudflare's Bot Management system, which then exceeded internal software limits and caused widespread traffic disruption.

Cloudflare emphasized the root cause was operational change rather than a cyberattack, and engineers mitigated the issue by reverting the feature file; the company reported full restoration by 17:06 UTC on November 18.

  • Scale: impacted about one in five websites globally
  • Duration: roughly three hours for major effects
  • Cause: oversized Bot Management feature file after permission change
ServiceImpactNotes
ChatGPTUnreachableAI chatbot access dropped for many users
SpotifyStreaming interruptionsPlayback and login errors reported
YouTubePartial outageEmbedded videos and uploads affected
Bet365Site offlineGaming and betting services disrupted
League of LegendsService lagMatchmaking and web portals unusable

The postmortem ruled out external attack as the trigger immediately.

Scope and services affected

Because Cloudflare routes traffic and provides CDN, DNS, and security services for millions of domains, the outage had outsized reach: estimates showed roughly 20% of websites relied on Cloudflare routing at the time, and several high-traffic platforms including gaming networks, streaming services, and news sites experienced partial or total downtime.

Public monitoring tools and status pages saw spikes in errors; Downdetector listings surged and major customers like ChatGPT and Spotify posted intermittent failures, illustrating how a single infrastructure provider outage can cascade across ecosystems.

  • Content platforms: video and audio services suffered playback issues
  • Applications: API-driven apps reported timeouts and auth failures
  • E-commerce: checkout flows and DNS resolution errors impeded sales
  • Gaming: matchmaking and login services showed latency or disconnects
  • News and publishing: site availability and comment systems went down

Analysts warned that concentration of cloud services raises systemic risk; teams should design multi-provider fallbacks and resilient DNS strategies.

Timeline and response

Detection and triage began quickly after error spikes appeared in monitoring systems; engineers identified abnormal Bot Management behavior, traced it to a permissions change, and executed a rollback that progressively restored connectivity across regions.

TimeEvent
13:00 UTCErrors spike in global monitoring
13:20 UTCInvestigators link issue to Bot Management file
14:05 UTCRollback initiated for feature file
15:30 UTCPartial recovery in some regions
17:06 UTCFull service restoration announced

Cloudflare's status updates and postmortem emphasized there was no evidence of malicious traffic or DNS compromise, and that changes to database permissions inadvertently allowed an oversized feature file to propagate; internal safeguards were adjusted and additional validation checks deployed to prevent recurrence.

  • Rolled back offending file and monitored traffic recovery
  • Validated configuration and permission controls across clusters
  • Updated incident response runbooks and added automated checks

External partners were notified and recommended DNS cache resets to reduce impact.

Lessons and next steps

The incident prompted rapid reactions from SRE teams and platform owners who prioritized incident review, customer communication, and contingency planning, with many citing the Cloudflare postmortem as a reminder that even sophisticated providers can suffer operational failures.

Security and infrastructure leads recommended multi-DNS setups, redundant CDNs, health-checking, and automated failover. Developers were urged to build graceful degradation for API calls and to prepare clear communication templates for real-time service degradation.

  • Design multi-provider topology
  • Implement DNS TTL and cache management
  • Automate rollback and configuration audits

One analyst noted that the 'cloudflare down' event exposes systemic concentration risk and urged enterprises to run regular chaos tests targeting third-party dependencies and to budget for provider redundancy.

Boards and CTOs will likely press vendors for tighter safeguards, more transparent change controls, and proof of isolation testing; observability data and incident drills will become standard parts of vendor evaluation over the next year.

Services were restored and Cloudflare published a detailed postmortem that attributed the outage to an operational permission change which created an oversized Bot Management feature file, not to a cyberattack. Teams across the internet will focus on resilience measures, including multi-provider DNS and CDN strategies, stricter configuration governance, and routine chaos engineering to simulate third-party failures and validate rollback paths. For operators tracking cloudflare down risks, actionable steps include reducing blast radius, lowering DNS TTLs, automating failover, and documenting communication plans for customers and stakeholders. Expect vendor Q&A sessions and regulatory scrutiny in coming months soon.