Exagen Diagnostics

Accelerating the Discovery of Genomic Marker Sets

Cole Harris is a computational scientist whose innovative work solving numerical computational problems is helping to redefine how genomic science gets done.

Harris is VP of Computational Science at Exagen Diagnostics, Inc., a company he co-founded to identify, validate and commercialize small sets of genomic markers. Using proprietary technology, Exagen is developing prognostic and predictive tests for use in commercial laboratories or for pharmaceutical use in clinical trials.

“High-performance computing is at the core of our business, so we need to squeeze every last ounce of performance out of our analytical platform,” says Harris. “Our Apple Xserve G5 cluster gives us a huge performance boost without adding a lot of complexity or administrative overhead.”

The breast cancer prognostic marker assays developed by Exagen provide the first DNA-based tests for hormone receptor positive and negative patients. These tests identify patients with a high or low risk of tumor recurrence, so that patients who may not benefit from treatment are not treated unnecessarily and those who will benefit are treated appropriately.

“Our Apple Xserve G5 cluster gives us a huge performance boost without adding a lot of complexity or administrative overhead.”

Exagen is also developing predictive tests to help the four million people in the United States and almost 200 million worldwide who are infected with the hepatitis C virus (HCV). The first test identifies those patients most likely to respond to the standard treatment regimen; the other identifies which HCV patients show evidence of liver damage.

A Fundamentally Different Approach

The experimental scientific method that has evolved over hundreds of years draws conclusions from observations, hypotheses and experiments. Thus, a biologist might form a hypothesis about the function of a certain set of genes and then design experiments to test that hypothesis. Most of the advances of modern science are a direct result of this experimental methodology. But as Harris discovered, some areas of study favor a radically different approach.

“We do something that is fundamentally different from most of the companies working on genomic marker products,” says Dr. Suzanne Mattingly, Exagen’s VP of Business Development and Marketing. “We don’t start with theories about specific gene functions. Instead, we use global search data-mining techniques — applied to the whole human genome — and let the data tell us which combinations of markers are truly correlated with the biological conditions we’re interested in.”

Exagen’s data-driven approach to marker discovery would have been inconceivable without the high-speed CPUs available today. Even with those fast CPUs, the sheer numbers are daunting.

“There are roughly 30,000 human genes,” says Harris. “And the search space grows exponentially with the number of genes involved in the potential solution. That’s trillions of possible combinations even if we were only to look at sets of three genes. Running the discovery algorithms is a sizeable computational undertaking, and it’s just the first step in a long process.”

Another computational challenge involves the relatively small numbers of biological samples available for testing, a problem common to almost all biomedical research. Although Exagen’s computations can evaluate 30,000 measurements per patient, there might be samples from only 100 patients from which to take those measurements. The small number increases the probability of introducing chance correlations. To address this problem, the Exagen researchers use their Xserve G5 cluster to perform rigorous significance tests that are even more compute-intensive than the initial mining steps.

“We always test our marker sets against new data from independent biological samples during the validation phase of our product development cycle,” says Harris. “But the high cost and difficulty of acquiring new biological samples gives us a strong incentive to weed out chance correlations earlier in our process. The speed of our Apple cluster lets us run more rigorous significance tests, increasing the odds that we’ll catch potential problems in silico early on, rather than later in the process, when the costs are vastly higher.”

1 2