Get AI summaries of any video or article — Sign up free
Wow! DNA could store Petabytes and is only 5 years away, new report says thumbnail

Wow! DNA could store Petabytes and is only 5 years away, new report says

Sabine Hossenfelder·
5 min read

Based on Sabine Hossenfelder's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

DNA can be synthesized and offers extremely high theoretical storage density (about 200 petabytes per gram), with current practical results still far above solid-state storage.

Briefing

DNA is being positioned as a high-density, long-life medium for archival data storage—and a new industry assessment suggests real deployments could arrive within 3 to 5 years. DNA can be synthetically made and, in theory, packs about 200 petabytes per gram (roughly 100 petabytes per cubic centimeter). Even with today’s practical limitations—where achieved densities are about 10 times lower than the theoretical ceiling—it still translates to around 10,000 times the storage density of current solid-state devices. Its other standout trait is longevity: DNA can persist for a thousand years or longer, making it attractive for “keep it forever” backup and record-keeping rather than everyday cloud use.

The push toward commercialization is backed by a consortium known as the DNA Storage Alliance, active since 2020 and now listing 41 members, including major technology companies such as Microsoft, Dell, IBM, Samsung, and Lenovo, alongside universities. In a June whitepaper, the group evaluates readiness and concludes that archival data storage use cases are likely to emerge over the next 3–5 years. That timeline matters because DNA storage has long been constrained less by the concept than by engineering bottlenecks—especially the process of converting data into DNA sequences and synthesizing the corresponding molecules.

How DNA storage works is straightforward in principle but complex in execution. Data is translated into a code using DNA’s four nucleotides, then chemically synthesized into custom DNA strands. Those strands are typically stored as a powder and later read back using DNA sequencing. The catch is speed: writing custom DNA is currently extremely time-consuming. As a result, DNA storage is not poised to replace cloud storage. Current performance is roughly 100 megabytes per day, while faster conventional storage systems can write around a gigabyte per second. DNA’s role is therefore likely to be specialized—ideal for safety backups, rare archives, and long-term preservation where retrieval infrequency and durability outweigh write speed.

Beyond storage, DNA is also being treated as a programmable computing substrate. Researchers have developed DNA-based neural networks and logical processing schemes where single-strand DNA interacts with double-strand DNA to form input, gate, and output strands through controlled chemical reactions. Because base pairing determines the behavior of these molecular “circuits,” programmed DNA networks can carry out simple calculations. The appeal extends to biocompatibility, opening potential pathways for medical diagnostics and other in-tissue applications.

Separate lines of work weave DNA into larger lattices to create materials with custom properties—such as guiding light or sound—by interlocking DNA structures. Taken together, the momentum points to a broader convergence of biology and computer science: DNA as both a storage medium and a building block for computation and engineered materials. The near-term expectation is archival storage first; the longer-term vision ranges from programmable nano-scale systems to DNA-designed materials that could reshape technology within a decade or two.

Cornell Notes

DNA is gaining traction as a long-term data storage medium because it can be synthesized, packed at extremely high theoretical densities (about 200 petabytes per gram), and preserved for a thousand years or more. A June assessment by the DNA Storage Alliance—an industry consortium with 41 members including Microsoft, Dell, IBM, Samsung, and Lenovo—predicts archival DNA storage use cases will emerge in the next 3–5 years. The workflow converts data into nucleotide sequences, chemically synthesizes the DNA, stores it as powder, and later retrieves it via DNA sequencing. Practical write speeds remain slow (around 100 MB per day), so DNA is unlikely to replace cloud storage; it’s better suited for specialized backups and archives. Separately, DNA is also being explored for molecular computing and programmable materials.

Why does DNA storage attract attention even though it’s not fast enough to replace cloud storage?

DNA’s value is durability and density, not write speed. It can be synthetically generated and theoretically stores about 200 petabytes per gram (around 100 petabytes per cubic centimeter). Even with current practical densities—about 10× below the theoretical limit—it still reaches roughly 10,000× higher density than today’s solid-state devices. DNA also lasts for a thousand years or longer, which fits archival backup needs where data must survive long periods. The tradeoff is speed: DNA writing is currently extremely time-consuming, at about 100 megabytes per day, versus roughly a gigabyte per second for fast conventional storage.

What does the DNA Storage Alliance’s readiness assessment say, and who is involved?

The DNA Storage Alliance, operating since 2020 and now with 41 members, evaluates how close DNA storage is to real-world deployment. In a June whitepaper, it concludes that archival data storage use cases are expected to emerge over the next 3–5 years. The membership includes major companies—Microsoft, Dell, IBM, Samsung, and Lenovo—along with universities, reflecting both industrial interest and ongoing research capacity.

What are the main steps in storing and retrieving data using synthetic DNA?

The process starts by converting data into a nucleotide-based code using DNA’s four nucleotides. Next, the system chemically synthesizes the DNA molecule that encodes the data. That customized DNA is typically stored as a powder. Retrieval happens by reading the stored sequence using DNA sequencing, which reconstructs the original encoded information.

How can DNA strands function like logic gates or neural-network components?

DNA-based computing relies on controlled strand interactions. One approach uses single-strand DNA that migrates onto a double-strand DNA “gate” and displaces a strand through chemical reactions, producing an output strand. Inputs and outputs are determined by base pairing rules, so programmed strand designs can implement logical processing steps. Related work also explores DNA neural networks built from these strand interactions, aiming for computation at the molecular level.

What other non-storage applications are being pursued with DNA beyond data archiving?

Researchers are also weaving DNA into larger lattices to engineer materials with custom properties—such as structures designed to guide light or sound. Separately, the idea of programmable nano-scale systems appears in the background: DNA-based logic and biocompatible molecular processing suggest possible uses in medical analyses, and potentially future applications like data collection or tissue repair.

Review Questions

  1. What specific properties of DNA make it suitable for archival storage, and how do those properties compare with solid-state storage?
  2. Why is DNA storage unlikely to replace cloud storage in the near term, and what metric illustrates that gap?
  3. Describe the basic pipeline from data to synthesized DNA to sequencing-based retrieval. What role do the four nucleotides play?

Key Points

  1. 1

    DNA can be synthesized and offers extremely high theoretical storage density (about 200 petabytes per gram), with current practical results still far above solid-state storage.

  2. 2

    DNA’s long lifetime—likely a thousand years or more—makes it especially suitable for archival backups rather than everyday storage.

  3. 3

    The DNA Storage Alliance (41 members since 2020, including Microsoft, Dell, IBM, Samsung, and Lenovo) projects archival DNA storage use cases will emerge within 3–5 years.

  4. 4

    DNA storage works by translating data into nucleotide sequences, chemically synthesizing the DNA, storing it as powder, and later reading it via DNA sequencing.

  5. 5

    Write speed is the major limitation: DNA storage is around 100 megabytes per day, compared with roughly a gigabyte per second for fast conventional storage.

  6. 6

    DNA is also being explored as a programmable computing medium, using strand interactions to create logic-like input–gate–output behavior.

  7. 7

    Separate research efforts use DNA lattices to design materials with custom optical or acoustic properties, showing DNA’s broader role beyond data storage.

Highlights

DNA’s theoretical capacity is about 200 petabytes per gram, and even today’s practical densities are still around 10,000× higher than current solid-state devices.
A June whitepaper from the DNA Storage Alliance predicts archival DNA storage use cases will emerge over the next 3–5 years.
DNA storage is constrained by synthesis time: current throughput is roughly 100 MB per day, far slower than gigabyte-per-second conventional methods.
DNA can be engineered into molecular logic systems where strand migration and base pairing determine input, gate, and output behavior.
DNA lattices are being used to build materials that can guide light or sound, pointing to engineered properties at the molecular scale.

Topics

  • DNA Data Storage
  • Archival Backups
  • Molecular Computing
  • DNA Sequencing
  • Programmable Materials