idw - Informationsdienst
Wissenschaft
The human genome is a long sequence of DNA scattered with innumerable genetic variants that distinguish us. Extracting information from large biobank datasets about complex traits, influenced by thousands or millions of variants, remains a challenge. Using human height as a model, researchers at the Institute of Science and Technology Austria (ISTA) have now tackled this problem and developed an enhanced algorithm, published in Cell Genomics, with potential applications in personalized medicine—and even at crime scenes.
Extracting and analyzing relevant medical information from large-scale databases such as biobanks poses considerable challenges. To exploit such ‘big data’, past attempts have focused on large sampling algorithms that model individual data points. However, since these algorithms sample the entire dataset millions of times, their theoretically very high level of precision comes at a prohibitive computational cost and therefore remains unattainable. To overcome this, scientists previously developed approaches that sacrifice accuracy for speed.
In a bid to optimize precision and performance, researchers from the groups of Matthew Robinson and Marco Mondelli at the Institute of Science and Technology Austria (ISTA) developed an algorithm that can extract and analyze information from the world’s most extensive biobank with unprecedented accuracy and speed. Ultimately, their method, presented here using the model complex trait of human height, could advance personalized medicine in the context of diagnostics—and even further forensics.
Algorithmic innovation using human height
The team’s approach draws on the recently established mathematical framework known as “approximate message passing” (AMP), to which Mondelli has made significant contributions. Their new method, dubbed “genomic Vector Approximate Message Passing” or gVAMP, enhances the framework’s ability to extract complex information from the dataset at hand.
“Whereas other methods tend to analyze one snippet at a time before combining the results, gVAMP functions as a ‘joint estimation’ method. Therefore, it provides a detailed overview of the effects on a trait in the context of all variants across massive-scale genetic datasets,” says ISTA PhD student Al Depope, the study’s first author. “We can speak of an algorithmic innovation.”
To develop their method, the team chose human height, an established model for the genetic analysis of complex traits.
“Examining human height allowed us to explore the limits of computational scalability with gVAMP, both in the number of genome sequences as well as the number of variants involved,” says Depope.
Indeed, the trait is influenced by a whopping 17 million variants, which the team could analyze simultaneously in hundreds of thousands of whole-genome sequences from anonymized volunteers contained in the UK Biobank, the world's most comprehensive dataset of biological, health, and lifestyle information.
“What I find particularly important is the interpretability of our algorithm when applied in biology. In addition to allowing us to predict people’s height from their DNA more accurately than before, it also allows us to pinpoint the specific DNA regions involved,” says ISTA postdoc and co-author Jakub Bajzik.
Outperforming existing methods
When gVAMP predicts human height and the contribution of individual genetic variants, the algorithm creates this data for the first time. As a result, there is no pre-existing data on human height against which to benchmark the method. “Essentially, the question here is ‘how do we know that gVAMP picked out the true variants?’” Depope explains.
To evaluate the strength of their method, the ISTA researchers performed a data simulation—a common approach in the field. They developed an artificial trait with roughly the same number of genetic variants as human height and performed an extensive simulation study on multiple datasets, benchmarking the algorithm’s performance against other methods. By doing so, they demonstrated that gVAMP largely outperforms existing methods in both accuracy and processing time.
“Our method achieves state-of-the-art accuracy while remaining efficient enough to perform a true joint analysis across massive-scale genetic datasets in mere days. This allows us to uncover the underlying biology previously hidden by limited scale,” says Depope. “The algorithmic innovation is exactly what makes this scale of analysis possible, as well as the resulting biological insights.”
From personalized medicine to forensics?
The interdisciplinary study combines expertise in information theory, mathematics, genomics, and software engineering. Bajzik’s background in computer science complemented Depope’s focus on theory and math. Robinson, who specializes in state-of-the-art statistical models for genomic data, co-supervised the project with Mondelli, who seeks to develop robust inference methods in information theory to address data-driven challenges in engineering and natural sciences.
Currently, the team is building on this work to extend it to personalized medicine and diagnostics applications. These could include predicting the time of disease onset, its severity, and when specific symptoms are likely to develop. In addition, they seek to extend the method to consider protein and epigenetic data, information not conveyed by the genomic sequences alone.
Ultimately, gVAMP’s potential in personalized medicine applications could also help clinicians select targeted patient profiles for clinical trials. But the method could even find other applications, according to Depope.
“I think our algorithm might also be useful in forensics to predict a suspect’s height from the DNA found on a crime scene,” he says.
--
Funding information
This project was supported by funding from a Lopez-Loreta Prize, an SNSF Eccellenza Grant (PCEGP3-181181), an ERC Starting Grant (INF2, project number 101161364), and by core funding from ISTA. High-performance computing was supported by the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).
https://ista.ac.at/en/research/robinson-group/ "Medical Genomics" research group at ISTA
https://ist.ac.at/en/research/mondelli-group/ "Data Science, Machine Learning, and Information Theory" research group at ISTA
Al Depope, Jakub Bajzik, Marco Mondelli, and Matthew R. Robinson. 2026. Joint modelling of whole genome sequence data for human height via approximate message passing. Cell Genomics. DOI: 10.1016/j.xgen.2026.101162 / https://doi.org/10.1016/j.xgen.2026.101162
The ISTA team’s interdisciplinary study combines expertise in information theory, mathematics, genom ...
Source: © ISTA
Copyright: © ISTA
ISTA scientists achieve algorithmic innovation in the genetic analysis of complex traits. Left to ri ...
Source: © ISTA
Copyright: © ISTA
Criteria of this press release:
Journalists, Scientists and scholars
Biology, Information technology, Mathematics, Medicine
transregional, national
Research results, Scientific Publications
English

You can combine search terms with and, or and/or not, e.g. Philo not logy.
You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).
Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.
You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).
If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).