idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Grafik: idw-Logo

idw - Informationsdienst
Wissenschaft

idw-Abo

idw-News App:

AppStore

Google Play Store



Instanz:
Teilen: 
18.02.2026 17:00

Big data and human height: ISTA scientists develop algorithm to boost biobank data retrieval & analysis

Veronika Oleksyn Communications, Events and Science Education
Institute of Science and Technology Austria

    The human genome is a long sequence of DNA scattered with innumerable genetic variants that distinguish us. Extracting information from large biobank datasets about complex traits, influenced by thousands or millions of variants, remains a challenge. Using human height as a model, researchers at the Institute of Science and Technology Austria (ISTA) have now tackled this problem and developed an enhanced algorithm, published in Cell Genomics, with potential applications in personalized medicine—and even at crime scenes.

    Extracting and analyzing relevant medical information from large-scale databases such as biobanks poses considerable challenges. To exploit such ‘big data’, past attempts have focused on large sampling algorithms that model individual data points. However, since these algorithms sample the entire dataset millions of times, their theoretically very high level of precision comes at a prohibitive computational cost and therefore remains unattainable. To overcome this, scientists previously developed approaches that sacrifice accuracy for speed.

    In a bid to optimize precision and performance, researchers from the groups of Matthew Robinson and Marco Mondelli at the Institute of Science and Technology Austria (ISTA) developed an algorithm that can extract and analyze information from the world’s most extensive biobank with unprecedented accuracy and speed. Ultimately, their method, presented here using the model complex trait of human height, could advance personalized medicine in the context of diagnostics—and even further forensics.

    Algorithmic innovation using human height

    The team’s approach draws on the recently established mathematical framework known as “approximate message passing” (AMP), to which Mondelli has made significant contributions. Their new method, dubbed “genomic Vector Approximate Message Passing” or gVAMP, enhances the framework’s ability to extract complex information from the dataset at hand.

    “Whereas other methods tend to analyze one snippet at a time before combining the results, gVAMP functions as a ‘joint estimation’ method. Therefore, it provides a detailed overview of the effects on a trait in the context of all variants across massive-scale genetic datasets,” says ISTA PhD student Al Depope, the study’s first author. “We can speak of an algorithmic innovation.”

    To develop their method, the team chose human height, an established model for the genetic analysis of complex traits.

    “Examining human height allowed us to explore the limits of computational scalability with gVAMP, both in the number of genome sequences as well as the number of variants involved,” says Depope.

    Indeed, the trait is influenced by a whopping 17 million variants, which the team could analyze simultaneously in hundreds of thousands of whole-genome sequences from anonymized volunteers contained in the UK Biobank, the world's most comprehensive dataset of biological, health, and lifestyle information.

    “What I find particularly important is the interpretability of our algorithm when applied in biology. In addition to allowing us to predict people’s height from their DNA more accurately than before, it also allows us to pinpoint the specific DNA regions involved,” says ISTA postdoc and co-author Jakub Bajzik.

    Outperforming existing methods

    When gVAMP predicts human height and the contribution of individual genetic variants, the algorithm creates this data for the first time. As a result, there is no pre-existing data on human height against which to benchmark the method. “Essentially, the question here is ‘how do we know that gVAMP picked out the true variants?’” Depope explains.

    To evaluate the strength of their method, the ISTA researchers performed a data simulation—a common approach in the field. They developed an artificial trait with roughly the same number of genetic variants as human height and performed an extensive simulation study on multiple datasets, benchmarking the algorithm’s performance against other methods. By doing so, they demonstrated that gVAMP largely outperforms existing methods in both accuracy and processing time.

    “Our method achieves state-of-the-art accuracy while remaining efficient enough to perform a true joint analysis across massive-scale genetic datasets in mere days. This allows us to uncover the underlying biology previously hidden by limited scale,” says Depope. “The algorithmic innovation is exactly what makes this scale of analysis possible, as well as the resulting biological insights.”

    From personalized medicine to forensics?

    The interdisciplinary study combines expertise in information theory, mathematics, genomics, and software engineering. Bajzik’s background in computer science complemented Depope’s focus on theory and math. Robinson, who specializes in state-of-the-art statistical models for genomic data, co-supervised the project with Mondelli, who seeks to develop robust inference methods in information theory to address data-driven challenges in engineering and natural sciences.

    Currently, the team is building on this work to extend it to personalized medicine and diagnostics applications. These could include predicting the time of disease onset, its severity, and when specific symptoms are likely to develop. In addition, they seek to extend the method to consider protein and epigenetic data, information not conveyed by the genomic sequences alone.

    Ultimately, gVAMP’s potential in personalized medicine applications could also help clinicians select targeted patient profiles for clinical trials. But the method could even find other applications, according to Depope.

    “I think our algorithm might also be useful in forensics to predict a suspect’s height from the DNA found on a crime scene,” he says.

    --
    Funding information

    This project was supported by funding from a Lopez-Loreta Prize, an SNSF Eccellenza Grant (PCEGP3-181181), an ERC Starting Grant (INF2, project number 101161364), and by core funding from ISTA. High-performance computing was supported by the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).


    Wissenschaftliche Ansprechpartner:

    https://ista.ac.at/en/research/robinson-group/ "Medical Genomics" research group at ISTA

    https://ist.ac.at/en/research/mondelli-group/ "Data Science, Machine Learning, and Information Theory" research group at ISTA


    Originalpublikation:

    Al Depope, Jakub Bajzik, Marco Mondelli, and Matthew R. Robinson. 2026. Joint modelling of whole genome sequence data for human height via approximate message passing. Cell Genomics. DOI: 10.1016/j.xgen.2026.101162 / https://doi.org/10.1016/j.xgen.2026.101162


    Bilder

    The ISTA team’s interdisciplinary study combines expertise in information theory, mathematics, genomics, and software engineering. Left to right: Al Depope, Jakub Bajzik, Marco Mondelli, and Matthew Robinson.
    The ISTA team’s interdisciplinary study combines expertise in information theory, mathematics, genom ...
    Quelle: © ISTA
    Copyright: © ISTA

    ISTA scientists achieve algorithmic innovation in the genetic analysis of complex traits. Left to right: Marco Mondelli, Matthew Robinson, Al Depope, and Jakub Bajzik.
    ISTA scientists achieve algorithmic innovation in the genetic analysis of complex traits. Left to ri ...
    Quelle: © ISTA
    Copyright: © ISTA


    Anhang
    attachment icon Left to right: ISTA postdoc Jakub Bajzik and the study’s first author, ISTA PhD student Al Depope

    Merkmale dieser Pressemitteilung:
    Journalisten, Wissenschaftler
    Biologie, Informationstechnik, Mathematik, Medizin
    überregional
    Forschungsergebnisse, Wissenschaftliche Publikationen
    Englisch


     

    Hilfe

    Die Suche / Erweiterte Suche im idw-Archiv
    Verknüpfungen

    Sie können Suchbegriffe mit und, oder und / oder nicht verknüpfen, z. B. Philo nicht logie.

    Klammern

    Verknüpfungen können Sie mit Klammern voneinander trennen, z. B. (Philo nicht logie) oder (Psycho und logie).

    Wortgruppen

    Zusammenhängende Worte werden als Wortgruppe gesucht, wenn Sie sie in Anführungsstriche setzen, z. B. „Bundesrepublik Deutschland“.

    Auswahlkriterien

    Die Erweiterte Suche können Sie auch nutzen, ohne Suchbegriffe einzugeben. Sie orientiert sich dann an den Kriterien, die Sie ausgewählt haben (z. B. nach dem Land oder dem Sachgebiet).

    Haben Sie in einer Kategorie kein Kriterium ausgewählt, wird die gesamte Kategorie durchsucht (z.B. alle Sachgebiete oder alle Länder).