Get AI summaries of any video or article — Sign up free
Cloudflare in trouble thumbnail

Cloudflare in trouble

The PrimeTime·
4 min read

Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

A React useEffect dependency array can cause runaway behavior when it includes a newly created object, because referential comparison fails every render.

Briefing

Cloudflare’s outage wasn’t triggered by an exotic cyberattack—it stemmed from a classic React mistake that accidentally triggered an infinite loop of dashboard API calls, reportedly ballooning into tens of thousands of requests. After Cloudflare successfully mitigated massive DDoS traffic in the past, the irony was hard to miss: the company that built its reputation on stopping threats was brought low by a front-end deployment bug.

The failure mechanism traced back to a React hook, specifically useEffect, which is meant to run a function when certain dependencies change. On initial render, the dashboard needs to fetch data once. But the code included a freshly created params object in the dependency array. Because that object is not referentially identical between renders (it’s a new object in memory each time), React treated the dependency as “changed” repeatedly. That shallow comparison behavior caused the effect to re-run continuously, generating a runaway cycle of backend fetches. The transcript describes this as an infinite loop visible on refresh, with backend calls stacking up—reportedly reaching about 25,000 API calls just from sitting on the dashboard.

Once the problematic code was identified and removed, Cloudflare attempted to restore stability by clearing logins so users would re-authenticate and reload into a healthy state. That fix, however, created a second-order problem: a thundering herd. When many users refreshed and re-logged at the same time, the API demand spiked in synchronized bursts rather than spreading naturally over time. The result was another surge that risked overwhelming systems again, even as the underlying issue was being corrected.

The larger controversy wasn’t only the bug itself, but how it reached production and why it wasn’t caught earlier. The transcript points to missing safeguards such as rate limiting, failure recovery, and—most importantly—release controls like slow rollouts or canary testing. With millions of users depending on the service, basic monitoring should have flagged abnormal request volumes quickly. A “simple canary” or staged deployment could have revealed that requests were running out of control before exposing the entire user base.

Cloudflare later referenced its Argo service, which can support automatic rollbacks when detection thresholds are crossed. Still, the episode highlights a broader lesson for high-scale systems: front-end dependency bugs can become infrastructure incidents if backend protections and deployment discipline aren’t strong enough to absorb sudden, synchronized traffic patterns. The takeaway is blunt—this wasn’t an advanced hacking event, but a production oops that turned into an operational crisis, and it raises uncomfortable questions about testing, throttling, and rollout strategy.

Cornell Notes

Cloudflare’s dashboard outage traced back to a React useEffect dependency bug: a newly created params object was placed in the dependency array, so React treated it as changed on every render. That referential inequality caused the effect to re-run continuously, triggering an infinite loop of backend API calls—reportedly around 25,000 requests just from loading the dashboard. After fixing the code, Cloudflare cleared logins to reset sessions, but that triggered a thundering herd as many users re-authenticated and refreshed simultaneously. The incident underscores that even “small” front-end mistakes can become large-scale outages without rate limiting, resilient APIs, and staged rollouts/canary checks.

How does a React useEffect dependency array turn into an infinite loop?

useEffect runs its function on initial load and then again whenever dependencies change. If the dependency list includes an object created during render (like a params object), that object will have a different memory reference each render. Because React uses referential/shallow comparison for dependencies, the dependency appears “different” every time, so the effect keeps firing and repeatedly triggers fetch calls.

Why did the bug translate into thousands of API calls instead of just a UI glitch?

Each time the effect re-ran, it kicked off backend fetches for the dashboard data. With the dependency condition perpetually “changed,” the system generated a continuous stream of requests. The transcript describes this as backend calls stacking up on refresh, reaching roughly 25,000 API calls from simply being on the dashboard.

What is a thundering herd, and how did it appear after the initial fix?

A thundering herd happens when many clients react to the same event at the same time, causing synchronized traffic spikes. After the code fix, clearing logins forced users to re-authenticate and reload. That synchronized re-login meant many API calls arrived in the same window rather than being naturally staggered.

What deployment and backend safeguards were missing or insufficient?

The transcript highlights the absence (or inadequacy) of rate limiting and recovery mechanisms in the API, plus weak release discipline. It argues that a slow rollout, canary testing, or automated rollback thresholds should have detected abnormal request volume before the entire user base was affected.

How does automatic rollback via Argo relate to preventing this kind of incident?

Cloudflare referenced Argo as a mechanism that can automatically roll back when detection triggers fire. In theory, if request rates or error patterns spike beyond expected thresholds, Argo can revert the change before it spreads broadly—reducing the blast radius of a runaway client-side loop.

Review Questions

  1. What specific property of JavaScript objects makes them risky to place directly into a React useEffect dependency array?
  2. Describe the sequence of events that led to both the initial request storm and the later thundering herd.
  3. Which safeguards—rate limiting, canaries, slow rollouts, or automatic rollback—would most likely have limited the incident’s impact, and why?

Key Points

  1. 1

    A React useEffect dependency array can cause runaway behavior when it includes a newly created object, because referential comparison fails every render.

  2. 2

    The dashboard incident reportedly escalated to about 25,000 API calls from repeated effect execution triggered by the dependency bug.

  3. 3

    Fixing the front-end issue by clearing logins can unintentionally create a thundering herd when many users re-authenticate simultaneously.

  4. 4

    High-scale systems need backend protections like rate limiting and robust failure recovery to prevent client-side loops from becoming infrastructure outages.

  5. 5

    Staged releases, canary testing, and automated rollback thresholds are critical to catch abnormal request volumes before full deployment.

  6. 6

    Even non-malicious “production oops” can become a major incident when monitoring and rollout discipline lag behind code changes.

Highlights

Cloudflare’s outage was linked to a React useEffect loop caused by including a freshly created params object in the dependency array.
The referential nature of objects meant the dependency looked “changed” every render, repeatedly triggering dashboard fetches.
Clearing logins to recover stability produced a thundering herd as users re-logged and refreshed in sync.
The incident raised questions about why rate limiting, canaries, and slow rollouts didn’t catch the abnormal traffic earlier.
Argo was cited as a tool for automatic rollback when detection thresholds are met.

Topics

Mentioned