idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Science Video Project
idw-Abo

idw-News App:

AppStore

Google Play Store



Instance:
Share on: 
11/30/2021 10:23

Exploring the current paradigm of gene regulation

Dr. Martin Ballaschk Pressestelle
Max-Planck-Institut für molekulare Genetik

    How do cells know when to activate a certain gene? This information is encoded in the sequence of the DNA, but our understanding of this code is incomplete. Researchers now tested how much information can be extracted from sequence data to predict which gene is active in which tissue.

    A good storyteller knows exactly which anecdotes will bring his stories’ characters to life. By telling the right story at the right time, our genome even manages to give rise to hundreds of different cell types with characteristic life stories breathing an individual identity into every cell.

    DNA snippets scattered across the genome harbor the code that directs the script of a cell’s life, successively switching genes on and off. Sequences called enhancers play an outstanding role in this process. They attract transcription factor proteins that start the expression of genes, thereby “enhancing” their activity. In some cases, they are located far away from the gene they activate.

    Researchers Philipp Benner and Martin Vingron from the Max Planck Institute for Molecular Genetics (MPIMG) set out to decipher the instructions of the activation patterns in distinct cell types and embryonic tissues of the mouse.

    With a series of statistical and bioinformatic analyses, the scientists identified several hundreds of tissue-specific DNA subsequences or “codewords” in enhancers that guide transcription factors, not only confirming sequences already known from other studies, but also identifying many new ones. The results have been published in several articles in NAR Genomics and Bioinformatics and the Journal of Computational Biology.

    “Today, researchers assume that all the information is in the DNA sequence, including information for specific cell types, tissues, and organs,” says Martin Vingron, Director at the MPIMG. According to the prevailing theory, transcription factor proteins recognize “codewords” in enhancers that are specific for a certain cell type, allowing the genome to tell a cell’s story by jumping to the right chapters. “We wanted to see how far this approach would take us and test its limits,” says Vingron.

    The researchers developed a program that is able to identify DNA sequences that are recognized by the cell in order to activate genes in a tissue-specific way. They achieved this by training a statistical model with existing experimental data, telling it which enhancer is active in which tissue. Namely, they used sequencing data from eight tissues of the embryonic mouse like heart, lung, brain, or liver.

    By comparing sequence data between the tissues, the program learned to recognize sequence patterns in enhancers that are characteristic for certain tissues.

    This told the researchers how much cell type-specific regulatory information is actually contained in the DNA sequence of enhancers, explains Philipp Benner, who is a postdoctoral researcher in Vingron’s lab: “The better our algorithm can classify any given enhancer, the more information it contains about the tissue or cell types that it is responsible for.”

    The statistical classifiers can also identify DNA subsequences that might underlie cell type-specific gene activation. In fact, Benner found several hundred new codewords in addition to patterns that have been identified in other studies.
    “Overall, we established a strong and, most importantly, an interpretable model,” says Benner.

    “With our advanced methods, the predictions are promising but far from perfect”, says Vingron. “Our results indicate that we might really have only a fragmentary understanding of the actual cell type-specific regulatory code.”
    It might be possible that not all the required information is contained in the DNA sequence of enhancers but is distributed elsewhere in the genome. Some cross-references in the storybook of the genome might still hide in other regulatory sequences, like promoter regions that are in close proximity to the gene itself.

    Parts of the project were funded by the Berlin Institute for the Foundations of Learning and Data (BIFOLD) of the German Federal Ministry of Education and Research (BMBF).


    Contact for scientific information:

    Prof. Martin Vingron
    Director, Head of the Department “Computational Molecular Biology”
    Max Planck Institute for Molecular Genetics
    +49 30 8413-1150
    vingron@molgen.mpg.de

    Dr. Philipp Benner
    Guest Scientist at MPIMG
    benner@molgen.mpg.de
    Federal Institute for Materials Research and Testing
    eScience Group
    +49 30 8104-3647
    philipp.benner@bam.de


    Original publication:

    Benner, Philipp. Computing leapfrog regularization paths with applications to large-scale k-mer logistic regression. Journal of Computational Biology 28.6 (2021): 560-569. DOI: https://doi.org/10.1089/cmb.2020.0284

    Benner, Philipp, and Martin Vingron. Quantifying the Tissue-Specific Regulatory Information within Enhancer DNA Sequences. NAR Genomics and Bioinformatics 3.4 (2021). DOI: https://doi.org/10.1093/nargab/lqab095

    Benner, Philipp, and Martin Vingron. ModHMM: A modular supra-Bayesian genome segmentation method. Journal of Computational Biology 27.4 (2020): 442-457. DOI: https://doi.org/10.1007/978-3-030-17083-7_3


    More information:

    https://www.molgen.mpg.de/4457206/ – This Press Release at the pages of the MPI for Molecular Genetics


    Images

    Criteria of this press release:
    Journalists
    Biology
    transregional, national
    Research results, Scientific Publications
    English


     

    Help

    Search / advanced search of the idw archives
    Combination of search terms

    You can combine search terms with and, or and/or not, e.g. Philo not logy.

    Brackets

    You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).

    Phrases

    Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.

    Selection criteria

    You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).

    If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).