The ability to compare the genomes of such a large cohort, along with the data contained in individual medical records, offers nothing less than an opportunity to revolutionize the way medicine is practiced, said Muralidhar. “Now you’re looking at large numbers of people who did and did not develop an illness,” she said. “And you can better pinpoint the causes of it, genetic as well as lifestyle and environmental factors. You can include all these in a study.”
At its core, MVP is a massive data mining effort, confronted with several unique challenges: recruiting participants on a massive scale; keeping this data protected without impeding the efforts of researchers to access it; and applying the computing power needed to squeeze hidden information and associations from such a vast amount of data.
The questions such a large genomic cohort can help to answer – Why does a treatment work well for some people, and not others? Why are some people at greater risk for this disease? How might we prevent certain diseases in the first place? – are relatively simple. But being able to answer them definitively, using genomic and other data, will do nothing less than revolutionize medicine.
The Million Veteran Program
To date, Muralidhar said, most genetic studies have had disappointing results, simply because they weren’t able to compare DNA from sufficient numbers of people with and without the disease to get a definitive answer, and a result that could be replicated by other studies. For the VA, whose investigators also study complex, multivariable disorders like post-traumatic stress disorder (PTSD) and the cluster of medically unexplained symptoms known collectively as Gulf War Syndrome, this has been a severely limiting circumstance.
The strength and validity of GWAS findings, obviously, increase in direct proportion to the amount of genetic material sampled. In 2011, the VA launched its landmark initiative to harness the full potential of the data contained within its health care system: the Million Veteran Program (MVP).
The MVP is an ambitious initiative to build a comprehensive database including genetic, military service, lifestyle, and health information from 1 million veteran participants. By the summer of 2016, the program had signed on its 500,000th volunteer, on pace to enroll its millionth veteran by 2020, making it one of the largest databases of its kind in the world: an integrated health and genomic database tied to a health care system, with the largest representation of minorities of any genomic cohort in the United States.
At its core, MVP is a massive data mining effort, confronted with several unique challenges: recruiting participants on a massive scale; keeping this data protected without impeding the efforts of researchers to access it; and applying the computing power needed to squeeze hidden information and associations from such a vast amount of data.
Blood and tissue samples from MVP participants are collected and stored at the Core Laboratory at the Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) within the Boston VA facility. Here, hundreds of thousands of DNA samples, taken from veterans taking part in MVP as well as clinical trials through the VA’s Cooperative Studies Program, are stored in a biorepository that will someday house millions of samples in an enormous freezer – sorted, stored, and retrieved by a sophisticated robotics system.
When a veteran enrolls as an MVP participant, a study number is assigned; each participant becomes an anonymous blood vial with a history attached to it. Anonymity over the lifetime of a participant is enabled by the MVP’s computer network, the Genomic Information System for Integrative Science (GenISIS). All data collected, for example from the survey or from the electronic medical record, are also assigned a code.