How cloud providers overcharge you (and how to fix it)

TL;DR

Audit AWS CloudWatch log group retention settings because “never expire” can create permanent monthly storage charges.

Briefing Cornell Notes

Briefing

Cloud bills balloon quietly through defaults, usage-based pricing, and “invisible” resources—so the fastest way to cut costs is to audit specific traps and change a handful of settings. The core message is that cloud spending rarely comes from one obvious purchase; it accumulates through log retention that never expires, bandwidth charges triggered by seemingly small assets, per-operation database pricing, and leftover infrastructure that keeps running after a project is “deleted.”

The first and most persistent cost leak on AWS comes from log storage defaults. When new resources are created—whether a Lambda function, an RDS instance, or an ECS cluster—AWS automatically sets up CloudWatch log groups. The retention policy for those log groups defaults to “never expire,” meaning stored log data accrues indefinitely. AWS then charges twice: once to ingest logs and again to store them monthly by gigabyte. Over time, that turns into ongoing “rent” for old text logs from years-old activity. The practical fix is to set retention to a short window such as 5–7 days (or even 1 day). For teams that dislike CloudWatch’s experience, the transcript also suggests moving logs off AWS by pulling them into a self-hosted Grafana setup (e.g., on a VPS) and then reducing CloudWatch retention to near-term storage.

Next comes the bandwidth trap, driven by egress fees. A small 10 MB looping hero video can become 10 GB of transferred data for 1,000 visitors and 1 TB for 100,000 visitors. On platforms like Vercel, crossing included bandwidth thresholds can trigger incremental charges (the transcript cites $0.15 per extra gigabyte), turning a marketing asset into a recurring margin killer. The recommended workaround is to host the video on YouTube or Vimeo and embed it, effectively shifting bandwidth costs to those platforms. For a more “native” HTML video experience, the transcript recommends hosting heavy assets on Cloudflare R2, which charges for storage but $0 for egress bandwidth.

A third trap targets image optimization costs. Framework image components (the transcript mentions Next.js/“XJS”) can trigger paid transformation limits on platforms like Vercel, with quotas described as low enough to be exhausted quickly on image-heavy sites. The suggested fix is to stop relying on the platform’s on-the-fly image component and instead serve pre-generated WEBP images using the standard HTML <img> tag. The argument is that modern network speeds make micro-optimizing load time less critical than avoiding recurring vendor processing fees; batch-converting media to WEBP during build time is presented as a way to capture most benefits without ongoing charges.

Other traps include choosing regions with local pricing premiums—where servers in places like São Paulo or Cape Town can cost up to 50% more than the same setup in the US—and the “NoSQL hype” pattern of using serverless databases like DynamoDB, which charge per operation and can become expensive when product requirements shift. The final, widely common issue is the zombie resource trap: deleting a server doesn’t necessarily delete dependent services like load balancers. The transcript recommends using AWS resource group search (tag editor) across all regions and resource types to find and terminate leftovers, and it suggests infrastructure-as-code (AWS CDK) to keep cleanup reliable.

The takeaway is blunt: audit logs, check regions, eliminate zombie resources, and replace “default” managed conveniences with cost-aware alternatives—before the bill quietly compounds into thousands of dollars a year.

Cornell Notes

Cloud spending often grows through small, recurring charges rather than one obvious purchase. On AWS, log groups created by new resources can default to “never expire,” causing monthly storage fees to accumulate forever; shortening retention (e.g., 1–7 days) is a direct fix. Bandwidth costs can spike when large assets like hero videos are served from a platform with egress fees; embedding from YouTube/Vimeo or hosting assets on Cloudflare R2 can prevent that. For image-heavy sites, platform image optimization quotas can be exhausted quickly; serving pre-converted WEBP files via standard <img> avoids recurring transformation charges. Finally, deleting compute doesn’t always delete dependencies, so zombie resources like load balancers can keep charging until explicitly removed.

Why do CloudWatch log groups become a long-term cost leak on AWS?

New AWS resources (e.g., Lambda, RDS, ECS) can automatically create CloudWatch log groups. The default retention policy for those log groups is set to “never expire,” so stored logs accumulate indefinitely. AWS charges both for ingesting logs and for storing them monthly by gigabyte, meaning old text logs from years ago keep generating storage fees. The fix is to change retention to a short window (the transcript suggests 5–7 days, or even 1 day). A more aggressive approach is to pull logs into a self-hosted Grafana data source (e.g., on a VPS) and then reduce CloudWatch retention so AWS stops storing everything forever.

How can a small landing-page video turn into a large recurring bill?

A 10 MB looping background video can scale into large data transfer volumes. With 1,000 visitors, it becomes about 10 GB of egress; with 100,000 visitors, it reaches roughly 1 TB. Platforms like Vercel may include only a limited amount of bandwidth on certain plans; once exceeded, they charge per extra gigabyte (the transcript cites $0.15/GB). The recommended mitigation is to host the video on YouTube or Vimeo and embed it, shifting bandwidth costs to those services. If a native HTML video look is required, hosting the asset on Cloudflare R2 is presented as a better option because it charges for storage but $0 for egress bandwidth.

What’s the “optimization trap” tied to image components, and what’s the proposed workaround?

Using framework image components (the transcript mentions Next.js/“XJS”) can trigger platform-paid image transformations. On Vercel, transformation limits are described as relatively low (around a “1,000 transformations” scale), so image-heavy sites can exceed the quota quickly and start paying premium prices just to serve images. The proposed workaround is to stop using the platform’s image component and instead serve standard HTML <img> tags with pre-generated WEBP images at a reasonable one-size-fits-most resolution. The transcript argues that modern 5G/fiber makes the user-perceived difference from dynamic resizing/compression less meaningful than avoiding recurring transformation fees; batch conversion during build time is suggested.

Why can region choice raise costs even when the infrastructure is identical?

Cloud pricing is described as local rather than global. Even if the same server configuration is used, regions can differ due to taxes, electricity costs, and infrastructure challenges. The transcript claims servers in places like São Paulo or Cape Town can cost up to 50% more than the same setup in the US. For most SaaS workloads where latency differences are negligible, it recommends selecting US-East as the default to reduce monthly bills. The transcript notes that strict data-processing or contractual requirements might require EU regions, though it also questions whether AWS can guarantee strict regional data residency.

What makes serverless NoSQL databases risky for cost control when product plans change?

Serverless databases like DynamoDB charge per operation—reads, writes, queries, and scans. That can be financially viable only when data access patterns are perfectly optimized and stable. The transcript highlights that product teams often pivot: new features, changed UI, and different query needs can make previously optimized access patterns inefficient. When that happens, the system may rely on costly scans/queries, driving up spend and even pressuring teams to avoid features to control costs. The proposed fix is to use “boring” SQL databases like Postgres or MySQL, which use a flat monthly cost model for the server box and are more forgiving when query patterns evolve.

How do “zombie resources” keep charging after a project is deleted, and how should they be found?

Cloud providers decouple services, so deleting one component doesn’t always remove dependent resources. The transcript’s example: deleting a server instance may leave a load balancer behind because it’s no longer attached to anything. That load balancer can continue routing traffic to nowhere while still charging (the transcript cites about $18/month). To find leftovers reliably, it recommends using AWS resource group search via the tag editor: set regions to all regions, resource types to all supported types, and run a search to list everything still alive. Then manually terminate items like orphaned load balancers or unused volumes. For ongoing projects, infrastructure-as-code (AWS CDK) is suggested to keep dependencies organized and cleanup repeatable.

Review Questions

Which AWS default setting causes log storage to accumulate indefinitely, and what retention window does the transcript recommend as a practical fix?
What two strategies are suggested to avoid bandwidth/egress charges from large video assets on landing pages?
Why does the transcript argue that serverless NoSQL per-operation pricing can become expensive after product pivots?

Key Points

1
Audit AWS CloudWatch log group retention settings because “never expire” can create permanent monthly storage charges.
2
Treat bandwidth as a cost driver: a 10 MB video can scale to 10 GB or 1 TB of egress depending on traffic volume.
3
Avoid platform image-transformation quotas by serving pre-generated WEBP images via standard HTML <img> tags instead of paid image components.
4
Choose cloud regions based on local pricing, not just physical proximity; US-East is presented as a common low-cost default.
5
Be cautious with serverless NoSQL like DynamoDB when product requirements may change, since per-operation billing can punish inefficient scans and queries.
6
Use AWS resource group search to locate zombie resources (like orphaned load balancers) that keep charging after compute is deleted.
7
Adopt infrastructure as code (AWS CDK) to make resource cleanup and dependency management more reliable.

Highlights

CloudWatch log groups created by new AWS resources can default to “never expire,” turning log storage into an endless monthly bill.

A 10 MB hero video can become 1 TB of transferred data at 100,000 visitors—egress fees can erase marketing margins fast.

Pre-converting images to WEBP and serving them with plain <img> tags can sidestep low transformation quotas on platforms like Vercel.

Deleting a server doesn’t guarantee deletion of dependent services; orphaned load balancers can keep charging until explicitly removed.

Per-operation pricing in serverless databases can become unpredictable and expensive when product access patterns shift.

Topics

Cloud Cost Optimization
AWS CloudWatch Retention
Bandwidth Egress Fees
Image Optimization Quotas
Serverless Database Pricing
Zombie Resources
Region Pricing

Mentioned

Simon Høiberg
AWS
RDS
ECS
UI
Vercel
WEBP
CDN
VPS
SQL
NoSQL
DynamoDB
Grafana
CDK
S3
R2
EU
SaaS
UI