An architecture for universal pathogen detection
Sore throat, fever, body aches, a forceful sneeze: the portends of impending illness. Could you have Covid? The flu?
An at-home nose swab or a trip to the doctor could provide confirmation, but rapid pathogen tests offer limited accuracy and cannot determine the type of variants. And most of these tests are pathogen-specific, meaning they can only detect the presence of a known single virus, rather than testing for any pathogen.
Now, Satish Narayanasmy, Reetuparna Das, and David Blaauw, professors of electrical engineering and computer science at the University of Michigan, along with their industry partners, AMD and nVidia, are working on significantly enhancing the speed and accuracy of genome sequencing technology. Their innovations in both software and hardware for rapid genome sequencing are opening up new avenues for medical testing for a wide range of pathogens, from common viruses to cancer.
“We envision a genome sequencing method that will enable rapid universal pathogen detection,” Narayanasamy described. “With a small sample of saliva or blood, we will be able to determine the pathogens in the sample, as well as their variants, within minutes.”
Reading the whole genome
Central to the goal of rapid and universal pathogen detection is the development of efficient and affordable methods for genome sequencing. It took 13 years and $3 billion to sequence the first human genome. While years have since been condensed to days, and billions of dollars to thousands, computing efficiency is not keeping pace with sequencing methods, as Moore’s law continues to taper off.
“Genome sequencing produces a lot of raw data — one blood sample can yield a whole terabyte of genomic data,” said Narayanasamy. “Processing these large volumes of data could allow us to detect cancer several years early with simple blood tests, but requires substantial computing power, which current infrastructure isn’t equipped to handle.”
Taking on this persistent challenge, a core part of the team’s ongoing work is the fusion of hardware and software design to both enhance the computing capabilities of current sequencing devices. Computing industry has been investing significantly in developing customized accelerators such as Google’ Tensor Processor Units (TPUs) for important applications such as machine learning. Das and Narayanasamy argue that customized computing hardware solutions for genomics warrants significant investment, given its growing computing needs and potential for transforming the health industry.
To this end, they have developed a domain-specific hardware and a framework to program them called GenDP. It supports a broad range of computational kernels used in genomics and achieves over two orders of magnitude improvement in efficiency over commodity GPU hardware. GenDP was originally published in ISCA 2023 and was later selected as a CACM research highlight.
To accelerate whole genome sequencing (WGS), the researchers developed custom accelerators for three performance critical kernels: seeding, seed-extension, and variant calling.
To this end, they have developed a new data structure that increases the use of memory capacity to lower the overall memory bandwidth of seeding, increasing data efficiency by 4.3 times. They also designed a novel computing structure to improve the efficiency of string matching, further accelerating seeding at both the software and hardware level.
Tackling the challenge of variant calling, the team has explored advanced computing techniques to enhance the speed with which genetic mutations are identified. They have developed a specialized chip that combines high-accuracy calculations with faster, approximate methods to deliver results up to 6.6 times faster than previous technologies, greatly improving the efficiency of mutation identification.
“Our past work in software and hardware co-design has led to significant improvements in the speed, accuracy, and scalability of genome sequencing,” said Narayanasamy. “It’s a trend we are hoping to continue in our future work.”
Together, the innovations of Narayanasamy and his collaborators are poised to revolutionize genome sequencing by reducing the computational resources it requires and thereby compressing the time and cost of processing. This progress is a leap forward in enabling rapid, scalable, and more affordable genome sequencing and making its use a reality in everyday medical and research applications.
Limitations of current testing methods
One area in which rapid and affordable sequencing technology could have vast implications, and a core focus of Narayansamy and Das’s work, is pathogen detection.
Current pathogen tests largely rely on a method known as PCR, or polymerase chain reaction. Now virtually a household name in the wake of the Covid-19 pandemic, PCR tests use custom-designed primers that amplify DNA or RNA regions in a given pathogen’s genome to detect that pathogen’s presence in a biological sample.
Although PCR tests were fundamental in enabling rapid and accessible diagnosis during the pandemic, this approach has many drawbacks that limit its usefulness and applicability in future outbreaks. First, the primers used to target pathogen DNA/RNA in the PCR process are costly and time-consuming to develop. In the case of Covid-19, although the SARS-CoV-2 genome was sequenced as early as January 2020, it took several months to design PCR primers for mass testing, a costly and ultimately deadly delay in an outbreak of such scale.
Moreover, PCR tests are targeted to one specific genome, meaning they can only test for one pathogen. Even if you receive a negative result on a Covid-19 PCR test, this does nothing to rule out all of the other potential pathogens you could be infected with, and pass on to others, without knowing.
“PCR is the current standard for pathogen testing, but it is not without issues,” said Narayanasamy. “PCR allows you to amplify a targeted part of a genome, but analyzing a sample and producing a result can take hours depending on the number of PCR cycles required, and it can only tell you about one pathogen.”
Next-generation handheld genome sequencing
Narayanasamy and Das are working to expand the possibilities of genome sequencing in medical diagnosis and overcome these limitations of PCR. To do this, they are using MinION, a revolutionary handheld device developed by Oxford Nanopore Technologies that allows for rapid genome sequencing with a single blood or saliva sample. MinION can sequence long strings of genomic data in real time, resulting in in-depth genomic sequencing in minutes.
Despite these benefits, one persistent drawback of MinION is, ironically, its completeness; it analyzes all the genomic data it receives. This inability to filter out irrelevant data leads to inefficiencies and limits its applicability in pathogen detection.
“When you’re trying to detect a pathogen in a biological sample, most of the DNA you’re analyzing will be human DNA,” said Narayanasamy. “If you’re only interested in pathogen DNA, a very small percentage of the total sample, you will have to do a lot of wasteful sequencing.”
Narayanasamy and Das have worked diligently to broaden the capabilities of the MinION device by equipping it with enhanced computing power to home in on viral data more efficiently and accurately. They have dedicated years of research to developing new technology for this purpose.
Their 2021 paper, “SquiggleFilter: An Accelerator for Portable Virus Detection,” introduced a new hardware-accelerated dynamic time warping (DTW) technique that filters out unnecessary genomic data when screening for pathogens using MinION. The DTW technology employed by SquiggleFilter uses specialized circuits to rapidly align sequences of time-series data, such as the electrical signals produced by nanopore DNA/RNA sequences like MinION, otherwise known as “squiggles.”
Equipped with this custom hardware, SquiggleFilter’s DTW algorithm is able to distinguish viral RNA/DNA squiggles—the unique patterns of a microbe’s genetic material—from the host’s genetic data. By focusing only on these viral squiggles, rather than the entire genome, this technique significantly reduces the amount of data it needs to process, dramatically accelerating the time from sample to result.
“SquiggleFilter accurately classifies and filters genomic data in real time, allowing the device to dispense with irrelevant data as it is read,” described Narayanasamy. “You can directly compare the messy squiggle derived from MinION’s readings with the expected messy squiggle for a given virus, and that will allow you to quickly filter out data that is not of interest.”
Testing against other GPU-based sequencing solutions highlighted SquiggleFilter’s performance in speedy genomic sequencing. “We showed that SquiggleFilter was over three thousand times faster than other available systems,” said Naryanasamy. In addition, it demonstrated over 270 times greater throughput and half the power consumption compared to existing solutions.
Possibilities for universal pathogen detection
While the researchers’ developments thus far clearly demonstrate the power of technology in enabling rapid genomic sequencing for detection of target pathogens, an open question remains: What if you want to test for an unknown pathogen?
“If you don’t know which microbe you’re looking for, you won’t have reference data to compare your sample to,” said Narayanasamy. In this case, the collected data would need to be characterized against human DNA to identify non-human sequences.
“The problem with this is that human DNA is very, very large,” explained Narayanasamy. “It is composed of about 3 billion characters, as opposed to Covid RNA, which is only about 30,000 characters long.”
Narayanasamy and Das are working to expand on the strengths of SquiggleFilter by developing a technique for universal pathogen detection. The key to this will be equipping MinION with the computing power and streamlined algorithms to distinguish and filter out human DNA, enabling it to identify and zoom in on non-human DNA. The goal is to be able to detect the presence of a pathogen and tell you which one it is, all in a matter of minutes.
Narayanasamy and his team believe this technique will be crucial in the detection and treatment of dangerous superbugs, such as Candida auris, which can cause rampant and severe illness in hospitals and other clinical settings. “If you are early in detecting the presence of a superbug in a hospital environment, you can take precautions to limit its spread,” he said.
The ability to rapidly detect and identify unknown pathogens could also help combat antibiotic resistance. By quickly identifying the cause of infections, providers will be able to better tailor treatments, thus reducing antibiotic overuse.
Opportunities for cancer diagnosis
Another area where rapid sample analysis and detection could have life-saving implications is in cancer diagnosis. By combining innovative technologies such as DNA amplification and rapid genome sequencing, Narayanasamy and his team have shown that it is possible to sequence the genome of target cancer cells in fewer than 30 minutes, allowing surgeons to make real-time treatment decisions shortly after or even during biopsy operations instead of waiting weeks for a diagnosis to begin treatment.
“Combining rapid DNA amplification with real-time data analysis significantly streamlines the process and enables quick and reliable cancer detection,” said Narayanasamy.
Current methods for intraoperative testing and diagnosis have serious limitations; they do not provide genomic data, often require multiple biopsy samples, and show low accuracy in detecting diffuse or infiltrative cancers.
Genomic sequencing, on the other hand, is both more accurate and provides a fuller picture of the molecular underpinnings of a given tumor. Such tests can determine which genetic mutations gave rise to the tumor, as well as helping predict the risk of recurrence. The current timeline for such diagnostic tests, however, is several days and even weeks, causing precious delays in treatment.
As part of their NSF-funded project, the researchers are working to enable more rapid diagnosis during surgery by leveraging MinION’s long-read sequencing abilities. Rather than using PCR for DNA amplification, their proposed approach employs a technique called loop-mediated isothermal amplification (LAMP), which is significantly faster but more error-prone than traditional methods. By coupling LAMP with real-time computational error correction, Narayanasamy and his team have shown that this is a much speedier and more feasible means of detecting cancer mutations.
“Combining rapid DNA amplification with real-time data analysis significantly streamlines the process and enables quick and reliable cancer detection,” Narayanasamy confirmed. “This could mean quicker decision-making during surgery and thus more timely treatment.”
Moving forward
While Naryanasamy and Das have come a long way in the development of next-generation genome sequencing technology, there is still work to be done. Now, with the aid of a $900,000 grant from the National Science Foundation (NSF), the team’s journey continues as they work toward optimizing these systems for practical use in medical settings. Their focus remains on refining the computational algorithms and hardware they have pioneered, propelling these advances from the research lab to the real world.
Narayanasamy and Das’s innovative approach to hardware-software co-design stands to revolutionize the speed, accuracy, and scope of genome sequencing. With applications in rapid, universal pathogen testing and intraoperative cancer diagnostics, their research is ushering in a new era in precision medicine.