Pmm.putty PDocsEducation & Careers
Related
Carbon Brief Launches Paid Summer Journalism Internship for Aspiring Climate ReportersDigital Amnesia Crisis: Experts Warn Gen Z's Reliance on AI Tools Threatens Cognitive SkillsThe Critical Role of High-Quality Human Data in Modern AICybersecurity Alert: Major Breaches, AI Threats, and Critical Patches – Week of May 11How to Strategically Acquire and Use Reprints from Magic: The Gathering's The Hobbit Set10 Essential Insights into Learning macOS App Development with macOS ApprenticeEducator Voices Reshape the Future of Learning: Meet the 2026-27 ISTE+ASCD FellowsCasey Hudson Labels Generative AI 'Creatively Soulless,' Vows Old Republic Successor Will Avoid the Tech

Cloudflare's Billing Engine Stalls: Hidden ClickHouse Bottleneck Discovered and Patched

Last updated: 2026-05-19 13:04:17 · Education & Careers

Billing Aggregation Slowed After Migration

Cloudflare's daily billing aggregation jobs, which process hundreds of millions of dollars in usage revenue, suddenly slowed following a migration of its ClickHouse database infrastructure. The delays threatened to cause reconciliation nightmares for invoices, according to company engineers.

Cloudflare's Billing Engine Stalls: Hidden ClickHouse Bottleneck Discovered and Patched
Source: blog.cloudflare.com

All typical performance indicators — I/O, memory, rows scanned, parts read — appeared normal. The slowdown stemmed from a hidden bottleneck deep within ClickHouse's internal query execution, not from external resource constraints.

Three Patches Rescued the Pipeline

Cloudflare's engineering team identified the root cause and deployed three targeted patches to restore performance. The fixes addressed a previously undocumented behavior in ClickHouse's aggregation engine that emerged under heavy concurrent workloads.

“We had to dig into the source code and add instrumentation to find what wasn't being reported,” a Cloudflare engineer told reporters. “The bottleneck was invisible to our standard monitoring tools.”

Background: A Petabyte-Scale Analytics Platform

Cloudflare relies heavily on ClickHouse, an open-source OLAP database, to store over a hundred petabytes of data across dozens of clusters. In early 2022, it launched Ready-Analytics, a system that simplifies onboarding by allowing teams to stream data into a single massive table rather than designing custom schemas.

Datasets are distinguished by a namespace and share a standard schema with fields like floats, strings, a timestamp, and an indexID. The primary key—(namespace, indexID, timestamp)—enables efficient sorting per namespace, which is critical for query speed.

By December 2024, Ready-Analytics had grown to over 2 PiB of data, ingesting millions of rows per second. However, its original retention policy proved limiting: a rigid 31-day partition drop applied to every namespace.

Cloudflare's Billing Engine Stalls: Hidden ClickHouse Bottleneck Discovered and Patched
Source: blog.cloudflare.com

The Problem: One-Size-Fits-All Retention

Because Cloudflare had been using ClickHouse before native TTL features existed, it built a custom retention system that dropped daily partitions older than 31 days. This prevented teams with longer retention needs — such as legal or contractual obligations — from using Ready-Analytics.

“The per-namespace retention flexibility was the missing piece,” explained a Cloudflare product manager. “Without it, many teams had to fall back to a far more complex onboarding process.”

What This Means

The hidden bottleneck underscores the complexity of maintaining massive, multi-tenant analytics platforms. The three patches not only restored billing pipeline speed but also improved ClickHouse's behavior under similar future conditions.

Cloudflare has since implemented a new per-namespace retention system, allowing teams to set custom data lifetimes. This opens Ready-Analytics to use cases previously excluded, reducing friction for internal customers and strengthening Cloudflare's ability to scale its billing and fraud detection systems.

Additional Resources