$135 Billion Accidentally Deleted By Google

TL;DR

UniSuper’s Google Cloud VMware Engine private cloud was automatically deleted one year after provisioning, breaking a key service.

Briefing Cornell Notes

Briefing

A Google Cloud VMware Engine private cloud instance used by Australia’s UniSuper—managing about A$135 billion for more than 600,000 members—was automatically deleted exactly one year after it was created, knocking out a key service and wiping the underlying compute and database stack. The outage mattered because restoring the application was relatively straightforward, but recovering the data required extensive manual work and depended on backups kept outside the deleted environment.

The chain of events traces back to May 1, 2023. UniSuper needed a VMware-based private cloud on Google Cloud, but its capacity requirements couldn’t be met through Google’s standard public interface. UniSuper contacted Google to provision a special private cloud using an internal deployment tool. Engineers reviewed the ticket, filled in account, region, and hardware specifications, and ran the command. Everything looked correct—until the private cloud’s default behavior kicked in.

On May 1, 2024, UniSuper developers found that a critical service stopped responding and that the cloud hosting it appeared to have vanished. Audit logs showed no human deletion, and the private cloud didn’t behave like a typical customer-managed environment that would normally persist. UniSuper investigated through Google’s internal API logs and discovered the instance had been deleted automatically when its one-year fixed term expired. That auto-deletion behavior was not supposed to be enabled for a customer-created private cloud.

The root cause was a missing parameter in the internal tool call made by Google engineers a year earlier. Because that parameter wasn’t included, the private cloud was created with the default “auto-delete after 1 year” setting. Google later confirmed the fault was on its side, after UniSuper initially issued a statement blaming a third-party provider and then updated it to name Google Cloud. By May 8, Google’s CEO publicly acknowledged responsibility.

Restoring the infrastructure and redeploying the application stack took advantage of existing code and pooling, but data recovery was the hard part. The deleted private cloud spanned multiple availability zones, and there was no intact replica/production stack in another zone that could be used to pull data back. UniSuper’s survival depended on backups: copies in Google Cloud Storage (a separate service from the deleted private cloud) and backups with another provider. Once users could log in and view balances again, service was largely back by May 15—about two weeks after the deletion.

The incident also sparked broader debate about service-level expectations and operational safeguards. Even with backups, the deletion created downtime and significant customer effort, raising questions about how “defaults” in internal tooling can become catastrophic when they slip into production. The practical takeaway was blunt: redundancy has to be real—multiple copies, on different media and providers, and ideally offsite—because operational mistakes at a cloud provider can still erase environments that customers assumed would persist.

Cornell Notes

UniSuper’s Google Cloud VMware Engine private cloud was automatically deleted one year after provisioning, because an internal deployment command missed a crucial parameter. The missing parameter caused the instance to inherit a default “auto-delete after 1 year” behavior, even though that wasn’t expected for a customer environment. When the deletion occurred on May 1, 2024, UniSuper’s compute and database stack disappeared, breaking a key service. Restoring the application was manageable, but data recovery required extensive manual work using backups stored outside the deleted private cloud. The outage ended up lasting until mid-May, and Google later confirmed the fault was theirs after initial third-party blame and subsequent CEO acknowledgment.

What exactly failed for UniSuper, and why was it so disruptive even if backups existed?

UniSuper’s key service stopped responding on May 1, 2024, and the hosting private cloud no longer existed. The deletion removed both compute capacity and the database stack across availability zones. Restoring infrastructure and redeploying code was described as relatively straightforward, but getting the data back required manual recovery because there was no unaffected replica/production stack in another zone. Backups saved the day: UniSuper had copies in Google Cloud Storage (separate from the deleted private cloud) and also backups with a different provider.

How did the deletion happen without any visible human action in audit logs?

Google’s internal API logs showed the private cloud was deleted automatically when its one-year fixed term expired. That auto-deletion behavior was unusual for a typical customer-created private cloud, and audit logs didn’t show engineers deleting it. The instance had been created with an expiry-driven default, so the system performed the deletion when the term ended.

What was the underlying technical mistake in the provisioning step?

Exactly one year earlier, Google engineers created the UniSuper private cloud using an internal tool. Everything was correct except a crucial parameter was missing from the tool invocation. Because that parameter wasn’t provided, the private cloud was created with the default behavior to automatically delete itself after one year.

Why couldn’t UniSuper rely on a replica in another availability zone?

In some recovery scenarios, data can be restored from a replica or production stack in a different availability zone that isn’t subject to the same deletion event. Here, the entire cloud spanning both zones holding production data was deleted, so there was no intact zone-based source to pull data from. That forced reliance on backups rather than intra-cloud replication.

What did Google’s public accountability look like after the outage?

UniSuper initially reassured users with a statement that the issue originated from a third-party provider and wasn’t a malicious attack. The statement was later updated to name Google Cloud. Skepticism persisted until May 8, when Google Cloud’s CEO confirmed it was Google’s fault.

What broader lesson emerged about cloud “defaults” and operational risk?

The incident highlighted how internal tooling defaults—especially expiry or auto-cleanup settings—can become dangerous when they slip into production provisioning. Even when the application stack can be redeployed, data recovery can become complex and time-consuming if the environment that holds production databases is erased. The practical defense was redundancy: multiple backups, including offsite and across different providers/services.

Review Questions

What missing provisioning detail caused the private cloud to inherit an unexpected auto-deletion behavior?
Why was data recovery harder than application restoration in UniSuper’s case?
What backup locations/services were referenced as critical to getting UniSuper back online?

Key Points

1
UniSuper’s Google Cloud VMware Engine private cloud was automatically deleted one year after provisioning, breaking a key service.
2
The deletion occurred due to a missing parameter in Google’s internal deployment tool, which caused an unexpected default auto-delete behavior.
3
Audit logs showed no human deletion; internal API logs indicated the expiry-driven automated deletion.
4
Restoring compute and redeploying code was relatively straightforward, but recovering databases required extensive manual work because the entire multi-zone production environment was deleted.
5
UniSuper avoided total data loss by using backups in Google Cloud Storage and backups with another provider.
6
Google Cloud’s CEO later confirmed the fault was on Google’s side after initial third-party blame and updates from UniSuper.

Highlights

A one-year expiry default—triggered by a missing parameter in Google’s internal tool—wiped UniSuper’s private cloud without any human deletion in audit logs.

Application recovery was manageable, but database recovery became the bottleneck because production data across availability zones was deleted.

UniSuper’s backups across Google Cloud Storage and a separate provider were the difference between outage and permanent loss.

Google Cloud’s CEO confirmed responsibility after UniSuper’s statements initially pointed to a third-party provider and then specifically to Google Cloud.

Topics

Cloud VMware Engine
Auto-Deletion
Backup Strategy
Incident Postmortem
Service Reliability

Mentioned

Alexis
GCP
VMware
VPC
SLA
TDD