When you go to the doctor or hospital, one piece of information that you’re always asked to provide — in addition to your name, address, and insurance information — is an emergency contact. Often, that person is a blood relative. Now, a collaborative team of researchers from three major academic medical centers in New York City is showing that emergency contact information, which is included in individuals’ electronic health records (EHRs), can be used to generate family trees. Those family trees in turn can be used to study heritability in hundreds of medical conditions. The study appears May 17 in the journal Cell.
“This is the first time family pedigrees have been built from EHRs,” says Fernanda Polubriaginof, a graduate student in biomedical informatics at Columbia University and the study’s first author. “It’s also the largest study ever of the heritability of traits using EHRs.”
The three participating academic medical institutions were Columbia University Vagelos College of Physicians and Surgeons and Weill Cornell Medicine (both in conjunction with New York-Presbyterian Hospital) and the Icahn School of Medicine at Mount Sinai. Using an algorithm that matched up people’s first and last names, addresses, and phone numbers — as well as how they were related to their emergency contact person — the investigators were able to identify 7.4 million familial connections.
Once the relationships were determined, patient identifiers were removed in order to protect privacy. Patient identifiers, including names, were only used by the algorithm in the matching process and were not shared between institutions.
The database that was generated was then used to compute heritability estimates for 500 different disease phenotypes based on test results and observations that appear in the medical records. These traits included things like blood disorders, skin diseases, and mental health conditions.
“This is really exciting new research, and it’s only the beginning of these kinds of studies,” says Nicholas Tatonetti, an assistant professor of biomedical informatics at Columbia University Vagelos College of Physicians and Surgeons and one of the paper’s senior authors. “We identified the heritability of 400 traits that have never been looked at in this way before. Until now, we didn’t know they were heritable. This research opens up opportunities for many more discoveries.”
To validate the accuracy of their methods, the investigators compared their findings with known heritability of a few well-studied inherited diseases, like sickle cell disease. Another component to validating the methods involved the inclusion of Mt. Sinai. Because that center already had a large biobank, including more than 25,000 people who have provided their familial relationships and been genotyped, Mt. Sinai’s data could be used to confirm that the research methodology was accurate.
The investigators say their data will be useful in establishing the heritability level of many common conditions. One example reported in this paper is the degree to which high levels of HDL and LDL cholesterol in the blood are inherited. Previous studies on the heritability of high cholesterol used datasets of a few dozen or a few hundred people. In the current paper, the investigators had cholesterol data collected from 120,000 people. They found that having an increased level of HDL is 50% heritable, while increased LDL is only 25% heritable. Future studies can look for the hereditary contribution of any trait that may be part of someone’s EHR.
Polubriaginof notes one thing that’s especially valuable about the new dataset is that it includes people from a wide range of races and ethnicities. “The majority of research on disease heritability has been done in Caucasians of mostly northern European descent,” she says. “This dataset will allow us for the first time to compute whether there are differences in other races and ethnicities.”
Tatonetti explains that because of privacy rules, at this point, the data can only be used for research purposes. “It’s easy to get excited about clinical utility, but we’re not there yet,” he says. “However, in the future, with proper consent, you could imagine information like this being shared with clinicians so they can alert their patients about potential health risks and additional screenings they may need to undergo. It could be very useful for identifying conditions like type 2 diabetes and celiac disease.”
For each of the 500 conditions, the investigators are releasing privacy-protected datasets that can be used by researchers at other institutions. They are also sharing their computational algorithm so that scientists at other hospitals can conduct studies of their own patients.
Materials provided by Cell Press. Note: Content may be edited for style and length.