A New Approach to DNA Data Storage

This is a slightly modified version of an article written by Matt Shipman, Research Lead in University Communications.

Professor Albert Keung
Professor Albert Keung

Professor Albert Keung and his colleagues in CBE and the Department of Electrical and Computer Engineering have developed a fundamentally new approach to DNA data storage systems, giving users the ability to read or modify data files without destroying them and making the systems easier to scale up for practical use.

“Most of the existing DNA data storage systems rely on polymerase chain reaction (PCR) to access stored files, which is very efficient at copying information but presents some significant challenges,” says Prof. Keung, co-corresponding author of a paper on the work. “We’ve developed a system called Dynamic Operations and Reusable Information Storage, or DORIS, that doesn’t rely on PCR. That has helped us address some of the key obstacles facing practical implementation of DNA data storage technologies.”

DNA data storage systems have the potential to hold orders of magnitude more information than existing systems of comparable size. However, existing technologies have struggled to address a range of concerns related to practical implementation.

Current systems rely on sequences of DNA called primer-binding sequences that are added to the ends of DNA strands that store information. In short, the primer-binding sequence of DNA serves as a file name. When you want a given file, you retrieve the strands of DNA bearing that sequence.

Many of the practical barriers to DNA data storage technologies revolve around the use of PCR to retrieve stored data. Systems that rely on PCR have to drastically raise and lower the temperature of the stored genetic material in order to open the double-stranded DNA and reveal the primer-binding sequence. This results in all of the DNA – the primer-binding sequences and the data-storage sequences – swimming free in a kind of genetic soup. Existing technologies can then sort through the soup to find, retrieve and copy the relevant DNA using PCR. The temperature swings are problematic for developing practical technologies, and the PCR technique itself gradually consumes – or uses up – the original version of the file that is being retrieved.

DORIS takes a different approach. Instead of using double-stranded DNA as a primer-binding sequence, DORIS uses an “overhang” that consists of a single-strand of DNA – like a tail that streams behind the double-stranded DNA that actually stores data. While traditional techniques require temperature fluctuations to open the double-stranded DNA in order to find the relevant primer-binding sequences, using a single-stranded overhang means that DORIS can find the appropriate primer-binding sequences without disturbing the double-stranded DNA.

“In other words, DORIS can work at room temperature, making it much more feasible to develop DNA data management technologies that are viable in real-world scenarios,” says James Tuck, co-corresponding author of the paper and a professor of electrical and computer engineering.

The other benefit of not having to open the double-stranded DNA is that the DNA sequence in the overhang can be the same as a sequence found in the double-stranded region of the data file itself. That’s difficult to achieve in PCR-based systems without sacrificing information density – because the system wouldn’t be able to differentiate between primer-binding sequences and data-storage sequences.

“DORIS allows us to significantly increase the information density of the system, and also makes it easier to scale up to handle really large databases,” says Kevin Lin, first author of the paper and a Ph.D. student in Prof. Keung’s research group.

And once DORIS has identified the correct DNA sequence, it doesn’t rely on PCR to make copies. Instead, DORIS transcribes the DNA to RNA, which is then reverse-transcribed back into DNA which the data-storage system can read. In other words, DORIS doesn’t have to consume the original file in order to read it.

The single-stranded overhangs can also be modified, allowing users to rename files, delete files or “lock” them – effectively making them invisible to other users.

“We’ve developed a functional prototype of DORIS, so we know it works,” Keung says. “We’re now interested in scaling it up, speeding it up and putting it into a device that automates the process – making it user friendly.”

The paper, “Dynamic and scalable DNA-based information storage,” is published in the journal Nature Communications. The paper was co-authored by Kevin Volkel, a Ph.D. student in Prof. Tuck’s research group.

The work was done with support from the National Science Foundation, under grants CNS-1650148 and CNS-1901324; a North Carolina State University Research and Innovation Seed Funding Award; a North Carolina Biotechnology Center Flash Grant; and a Department of Education Graduate Assistance in Areas of Need fellowship.

An article describing some of Prof. Keung’s earlier research with DNA storage systems is available here.