Bits and SNPs: Atul Butte and Medicine in the Era of Big Data

March 18, 2013

By Matt Luchette 

March 18, 2013 | One of the most surprising parts of Atul Butte’s lab website is his open-access calendar. At the bottom of his profile page, Butte, Chief of the Division of Systems Medicine at Stanford University and a keynote speaker for the Bio-IT World Expo on April 10, lists his daily commitments, instructions for making an appointment, and his assistant’s phone number. The message is subtle, but clear: personal data cannot stay personal for long if Big Data will succeed in medicine. 

The past year has seen significant advances in the analysis of large data sets to arrive at more simple conclusions: using multiple polling sources, Nate Silver’s Bayesian algorithm perfectly predicted the outcome of the Electoral College for the 2012 election; and Facebook’s Graph Search, announced this past January, allows users to quickly search their friends’ profiles for shared interests and trends.

0313_ButteThe challenge is not generating the necessary data (in 2012, nearly 2.5 quintillion bytes of data were generated worldwide each day, enough to fill nearly a million iMacs), but making any useful sense of the ocean of data available. Many of the most impactful contemporary scientists are those who can draw elegant and interesting conclusions from this vast information ether.

The genome sequencing revolution helped catalyze a similar Big Data revolution in the life sciences. With the results of over a million gene expression microarrays publicly available online, the trick for biologists is no longer figuring out how to sequence the genome, but rather what the data is saying. Butte found his niche in this developing field. In an April 2012 TEDMED talk, Butte explained this next big challenge for biologists, saying “We have so much data that the new magic now is figuring out what question we want to ask.”

Butte’s lab at Stanford is focused on developing computational methods to draw new insights from the world’s publicly-available experimental data published each year in thousands of biological journal articles. “Scientists often use only a fraction of the data they collect,” Butte explained to Bio-IT World, “And only a smaller fraction of that data is fully appreciated and reanalyzed.”

Using a Bayesian method, similar to the one used for Nate Silver’s election forecasting, Butte hopes to help doctors harness the mountain of experimental data currently available. In one study, Butte’s lab analyzed over 12,000 scientific papers for the influence of genomic factors on specific diseases, such as cancer or Type 2 Diabetes. By creating an algorithm that ranks the correlation between genomic variation and a certain health complication, doctors could foreseeably provide medical recommendations based on a patient’s sequenced genome as a part of a routine check-up.

Similar studies may even change the type of diseases a specialist treats. In the first iteration of the International Classification of Diseases in the early 1900s, the World Health Organization’s taxonomy guidelines for grouping diseases to facilitate statistical analysis, diseases were classified largely based on the symptoms they caused or the organ in which they presented. Since then, the advent of molecular biology has helped doctors understand the pathology for many diseases on a subcellular level. Medicine today would seem completely foreign to the authors of the first I.C.D., but many of their original guidelines are still in place. As Butte put it in a lecture at TEDx San Francisco this past November, “Why did we ever even think that cancers would look like each other, just because the same doctor takes care of them?”

Reclassifying and relating diseases based on their molecular abnormalities, rather than their organ of genesis, could aid researchers in the process of drug repositioning. As pharmaceutical companies lose millions of dollars each year on candidate compounds that fail the early stages of FDA testing, many researchers are hoping to expand the indications for approved drugs into new diseases. As Butte described it to me, the goal is to “show that a Big Data approach can be translated for existing drugs.” Knowing which diseases are caused by disregulated microtubule formation, for example, could help researchers identify existing microtubule inhibitors that could aid in treating them.

But the problem, Butte says, is, “there’s no funding for clinical drugs for repurposing.” The NIH has only recently created pilot funding mechanisms for repurposing existing drugs. Yet after publishing two papers on possible candidates for drug repositioning—including an ulcer medication that shows promise in treating lung cancer—Butte helped start a company, NuMedii, in 2008 to fund these clinical trials and develop new Big Data technologies to determine promising new drug candidates for repositioning. The company’s successful algorithms attracted the attention of executives at Aptalis Pharma, who in October 2012 struck a deal with NuMedii to test proposed drug candidates of cystic fibrosis and gastrointestinal disorders.

Since founding NuMedii, Butte has used his research at Stanford to launch and consult for a number of other companies, including Personalis, which Butte helped found in 2011. The goal of Personalis is to provide researchers an accurate and comprehensive end-to-end human genome sequencing and analysis solution. By combining high-accuracy genome sequencing technologies with genome annotation and analysis from a number of proprietary genomic databases, Personalis hopes to help researchers glean more information from their genomic data.

Butte recognizes that many of the problems bioinformatics hopes to solve may seem daunting at first, especially to clinicians who will someday be expected to take advantage of these new discoveries, but he has encouraging words for his fellow health care providers.

“Don’t be afraid,” he advised in a live-tweet from his talk at the Future of Genomic Medicine conference last week. “We’ve learned how to interpret CT and MRI, we’ll figure out genomics too.”

Editor’s Note: The Bio-IT World Conference and Expo will be held in Boston, April 9-11, at the World Trade Center. For more information, visit www.bio-itworldexpo.com.