How I parsed billions of rows for every user in 2 seconds

TL;DR

Convex handled app workflows well, but it wasn’t suitable for per-user analytical scans across massive event tables.

Briefing Cornell Notes

Briefing

A wrapped-style analytics feature that once took 10–20 minutes per user was brought down to under 10 seconds—and in some cases to a few hundred milliseconds—by changing how usage statistics are computed and cached. The core problem wasn’t the UI or even the analytics logic; it was the data plumbing. Convex powered the app backend, but it wasn’t built for scanning and aggregating billions of analytics rows per request. Post Hog was better suited for analytics, yet its ClickHouse-based query model and rate limits made repeated, heavy queries too slow and too error-prone when multiplied across thousands of users.

The initial architecture ran a Convex workflow that queued a “compute wrapped” job and then executed multiple Post Hog queries to assemble the stats shown in the wrapped experience: model usage counts, feature usage (edits, branches, retries), and time breakdowns (by day, weekday, and hour). Several steps required full scans and grouping—especially percentiles, which had to compare each user’s total usage against the distribution across all users. Early timing showed the first query step in hundreds of milliseconds, but model usage and feature usage ballooned into tens of seconds, and percentile calculations reached roughly 30 seconds. Worse, Post Hog rate limits (with concurrency and per-minute/hour caps) turned the situation into a cascading failure: the system attempted about 10 queries per user, and the queue quickly accumulated thousands of long-running generations.

The turnaround came from two tactics: reducing query pressure and shifting expensive computation into precomputed caches. First, the team added operational controls (feature flags, higher parallelism limits within reason, more logging, and fixing a default Post Hog date limit that capped results at 100 days). Then they rewrote queries to reduce the number of separate requests—combining logic into fewer SQL statements using unions to stay under rate limits.

The biggest leap, however, came from materialized views inside Post Hog. Instead of recomputing “usage by user” and “model usage by user” from raw event tables every time, the system precomputed subqueries into cached tables (effectively stored results that could be queried like normal tables). Once those materialized views existed, the wrapped queries became dramatically simpler: percentile calculations could be expressed as counts over cached per-user aggregates rather than repeated full-table scans. That change collapsed end-to-end runtimes from 6+ minutes to under a second for key steps, and production generations began completing in roughly 10 seconds on average.

From there, further refinements squeezed the remaining overhead: collapsing multiple actions into fewer Convex workflow steps, using Post Hog Endpoints (a beta feature) to expose dashboard-defined queries over HTTP for better caching behavior, and experimenting with “single-shot” query shapes that compute more at once. After these iterations, reruns sometimes landed under 200 milliseconds due to warm caching.

The practical takeaway is that analytics at wrapped scale demands precomputation and careful respect for query limits. The performance win wasn’t magic SQL—it was architectural alignment: compute once, cache aggressively, and design queries to avoid repeated scans of massive event histories.

Cornell Notes

The wrapped feature’s runtime collapsed from 10–20 minutes (and sometimes hours under load) to under 10 seconds by changing how analytics are computed. The original approach issued many Post Hog queries per user, including full-table scans and percentile calculations that triggered Post Hog rate limits and timeouts. The breakthrough was building Post Hog materialized views—cached per-user aggregates like “usage by user” and “model usage by user”—so later wrapped requests could query small precomputed tables instead of scanning billions of rows. Additional gains came from query consolidation (fewer requests via unions), action/workflow restructuring in Convex, and using Post Hog Endpoints to benefit from endpoint caching. The result: near-instant wrapped generation for many users and far fewer production failures.

Why did the first implementation take so long even though it used an analytics database?

It still ran many heavy queries per user. Several steps required full scans of AI generation events and repeated grouping (model usage, feature usage, and especially percentiles). Percentiles compared each user’s totals against the distribution across all users, which is expensive when computed on the fly. On top of that, Post Hog’s ClickHouse-based analytics setup wasn’t designed for millisecond app-style reads, and Post Hog rate limits (concurrency plus per-minute/per-hour query caps) caused timeouts and cascading slowdowns when thousands of queued generations hit the system.

What specific Post Hog rate-limit problem made the situation worse?

The implementation ran about 10 queries per user, with concurrency capped at three workflow runners. Post Hog rate limits were hit quickly: roughly 240 queries per minute and 2400 per hour, with three queries running concurrently. Once those limits were exceeded, some queries timed out (one percentile step failed after about 141 seconds), and the system had to retry work—keeping the queue clogged.

How did materialized views change the performance profile?

Materialized views precomputed expensive subqueries into cached tables. Instead of recomputing “usage by user” and “model usage by user” from raw events each time, the system stored per-user aggregates (filtered to the relevant timestamp range) as cached results (e.g., ~107,000 rows for matching user IDs). Wrapped queries then became simple selects from these cached tables, turning percentile calculations into count-based comparisons over pre-aggregated data rather than repeated full-table scans.

What did query consolidation (unioning) accomplish?

It reduced the number of separate Post Hog queries hitting rate limits. Even when unioning didn’t always make the SQL faster in a traditional sense, it lowered request count—so the system stayed under concurrency and rate caps. That reduced errors and improved reliability, which in turn improved effective throughput.

Why did using Post Hog Endpoints help after materialized views?

Endpoints exposed the same dashboard-defined queries over HTTP. The system called those endpoints from Convex, which enabled additional caching behavior at the endpoint layer. That reduced runtime further (e.g., a rerun dropped to a few seconds, and later reruns sometimes hit ~185 ms or ~157 ms due to warm caching). It also shifted the work into a pattern that could be cached more effectively than repeated direct query execution.

Why did the team avoid “live” time windows for the cached analytics?

Live windows complicate caching because new events arriving after the wrapped job starts can make cached aggregates stale. The team treated the parsed data as effectively static by choosing a fixed reference window (around the ship date). That preserved the optimization benefits of materialized views and prevented cache-busting edge cases like “what if the user sends more messages after the last generation?”

Review Questions

What makes percentile calculations particularly expensive in this wrapped analytics setup, and how did the materialized-view approach address that cost?
How do Post Hog rate limits interact with a workflow that issues many queries per user, and what strategies reduced the impact?
Describe the sequence of optimizations from “many full scans” to “cached aggregates.” Which change produced the largest runtime drop?

Key Points

1
Convex handled app workflows well, but it wasn’t suitable for per-user analytical scans across massive event tables.
2
On-the-fly percentiles and other grouped aggregations triggered full-table scans and pushed runtimes into tens of seconds per step.
3
Post Hog rate limits (concurrency plus per-minute/per-hour caps) turned heavy per-user query patterns into timeouts and queue backlogs.
4
Materialized views were the main breakthrough: precompute per-user aggregates once, then compute wrapped stats by querying cached tables instead of raw events.
5
Reducing the number of Post Hog requests (e.g., unioning logic into fewer queries) improved reliability by staying under rate limits.
6
Further speedups came from restructuring Convex workflow actions and using Post Hog Endpoints to benefit from endpoint-level caching.
7
Treating the analytics window as effectively static made caching practical and avoided cache-staleness problems from new incoming events.

Highlights

The largest runtime collapse came from replacing repeated full-table scans with Post Hog materialized views that cached per-user aggregates.

Percentile steps were the performance cliff: they required comparing each user’s totals against the full distribution, which is costly when computed live.

Post Hog rate limits (and timeouts) weren’t just a nuisance—they amplified queue depth until generations took hours.

Endpoint-based execution plus warm caching pushed reruns into sub-200-millisecond territory in some cases.

The final system made wrapped generation feel instant by aligning query design with analytics caching rather than app-style querying.

Topics

Convex Workflows
Post Hog Analytics
Materialized Views
SQL Query Optimization
Rate Limits

Mentioned

Theo
AI
SQL
S3