idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
idw-Abo

idw-News App:

AppStore

Google Play Store



Instanz:
Teilen: 
09.12.2025 12:30

Speech-to-Expression: Controlling Digital Head Avatars via Audio Signals

Philipp Zapf-Schramm Presse- und Öffentlichkeitsarbeit
Max-Planck-Institut für Informatik

    Realistic digital avatars are becoming increasingly relevant, for example in virtual and augmented reality applications, video conferencing, films and computer games, or in medicine. Researchers at the Max Planck Institute (MPI) for Informatics in Saarbrücken, Germany, are now presenting two novel methods at two of the world’s leading computer graphics conferences, SIGGRAPH and SIGGRAPH Asia. These methods enable the generation of photorealistic full-body avatars and allow head avatars to be controlled using only audio tracks.

    Previous methods for generating digital avatars have had significant limitations: The face and body often cannot be controlled independently, clothing sometimes looks unnatural, the renderings are often convincing only from certain perspectives, and facial animations frequently appear sterile and lifeless. With their works “EVA: Expressive Virtual Avatars from Multi-view Videos” and “Audio-Driven Universal Gaussian Head Avatars”, the Max Planck researchers are taking a step toward solving these problems.

    The paper “Audio-Driven Universal Gaussian Head Avatars”, to be presented in December at SIGGRAPH Asia in Hong Kong, describes a method by which photorealistic 3D head avatars can be automatically animated and controlled using only voice recordings. The foundation of this is the newly developed Universal Head Avatar Prior (UHAP), a model pre-trained on a large number of video recordings of real people from a publicly available dataset. It can clearly distinguish between identity (the appearance of a specific person) and expression (facial expressions and movements).

    An audio-encoder then translates audio signals directly into the expression renderings of the digital avatar model. Unlike earlier approaches, it takes into account not only lip and jaw movements but also fine, audio-dependent changes such as movements inside the mouth or subtle facial expressions. Using this pre-trained model, highly realistic 3D facial renderings can be generated with significantly less data. “Our goal is to create digital heads that not only synchronize with speech, but also behave lifelike, incorporating subtle details such as eyebrow movements and gaze shifts,” says Kartik Teotia, a doctoral student at Saarland University conducting research at MPI for Informatics.

    In addition to faces, the research at MPI for Informatics also covers methods for generating full-body avatars. The paper “EVA: Expressive Virtual Avatars from Multi-view Videos”, published in August at the SIGGRAPH conference in Vancouver, describes a novel approach in which the modeling of motion and appearance are separated. A flexible digital model first captures the body, hands, and face, along with their movements and expressions. A second layer then adds the external appearance, that is, skin, hair, and clothing. “With EVA, we can realistically generate movements and facial expressions independently of one another, and also render them from new viewpoints that were not inlcuded in the original recordings,” says Marc Habermann, head of the research group Graphics and Vision for Digital Humans at MPI for Informatics. For now, the system still requires training with recordings from a lab facility at the Institute, where a person is filmed from more than one hundred camera perspectives simultaneously.

    “With these two works, we are advancing research on realistic digital avatars in a decisive way. Such models could fundamentally change how we communicate, collaborate, or acquire new skills in the future, for example through virtual tutors, extending far beyond computer science,” says Professor Christian Theobalt, Director at the Max Planck Institute for Informatics and head of the Visual Computing and Artificial Intelligence department, where these projects are being developed. Theobalt is also founding director of the Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence (VIA), a strategic research partnership with Google.

    Both of the above-mentioned works have already attracted interest from industry. “EVA: Expressive Virtual Avatars from Multi-view Videos” was developed in collaboration with Google at the Saarbrücken VIA Center. “Audio-Driven Universal Gaussian Head Avatars” was developed with scientific collaboration from Flawless AI, a London-based film technology company recently named one of TIME Magazine’s 100 Most Influential Companies of 2025. Flawless AI’s Visual Dubbing technology, built on foundational research pioneered by Theobalt’s department, enables actors’ lip movements to be precisely adapted for new languages, a breakthrough that is drawing growing attention across Hollywood. In May 2025, the first full-length feature reworked with Visual Dubbing, Watch the Skies, was released in U.S. cinemas.

    Press contact and editor:
    Philipp Zapf-Schramm
    Max Planck Institute for Informatics
    Phone: +49 681 9325 4509
    Email: pzs@mpi-inf.mpg.de


    Wissenschaftliche Ansprechpartner:

    Prof. Dr. Christian Theobalt
    Director, Department „Visual Computing and Artificial Intelligence“
    Max Planck Institute for Informatics
    Mail: d6-sek@mpi-inf.mpg.de
    Tel: +49 681 9325 4500

    Dr. Marc Habermann
    Group leader, Group „Graphics and Vision for Digital Humans“
    Max Planck Institute for Informatics
    Mail: mhaberma@mpi-inf.mpg.de
    Tel: +49 681 9325 4507


    Originalpublikation:

    Kartik Teotia, Helge Rhodin, Mohit Mendiratta, Hyeongwoo Kim, Marc Habermann, and Christian Theobalt. 2025. Audio-Driven Universal Gaussian Head Avatars. In SIGGRAPH Asia 2025 Conference Papers December 15–18, 2025, Hong Kong, Hong Kong. ACM, New York,NY, USA, 16 pages. https://doi.org/10.48550/arXiv.2509.18924

    Hendrik Junkawitsch, Guoxing Sun, Heming Zhu, Christian Theobalt, and Marc Habermann. 2025. EVA: Expressive Virtual Avatars from Multi-view Videos.In Special Interest Group on Computer Graphics and Interactive TechniquesConference Conference Papers (SIGGRAPH Conference Papers ’25), August10–14, 2025, Vancouver, BC, Canada. ACM, New York, NY, USA, 20 pages. https://doi.org/10.1145/3721238.3730677


    Weitere Informationen:

    https://www.mpi-inf.mpg.de/de/departments/visual-computing-and-artificial-intell... Department Visual Computing and Artificial Intelligence
    https://gvdh.mpi-inf.mpg.de/index.html Graphics and Vision for Digital Humans group
    https://www.via-center.science/ Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence


    Bilder

    Dr. Marc Haberman, Kartik Teotia and Prof. Dr. Christian Theobalt (left to right) standing in the Multi-view Video Studio at the Institute.
    Dr. Marc Haberman, Kartik Teotia and Prof. Dr. Christian Theobalt (left to right) standing in the Mu ...
    Quelle: Philipp Zapf-Schramm
    Copyright: Max Planck Institute for Informatics


    Merkmale dieser Pressemitteilung:
    Journalisten
    Informationstechnik
    überregional
    Forschungsergebnisse
    Englisch


     

    Dr. Marc Haberman, Kartik Teotia and Prof. Dr. Christian Theobalt (left to right) standing in the Multi-view Video Studio at the Institute.


    Zum Download

    x

    Hilfe

    Die Suche / Erweiterte Suche im idw-Archiv
    Verknüpfungen

    Sie können Suchbegriffe mit und, oder und / oder nicht verknüpfen, z. B. Philo nicht logie.

    Klammern

    Verknüpfungen können Sie mit Klammern voneinander trennen, z. B. (Philo nicht logie) oder (Psycho und logie).

    Wortgruppen

    Zusammenhängende Worte werden als Wortgruppe gesucht, wenn Sie sie in Anführungsstriche setzen, z. B. „Bundesrepublik Deutschland“.

    Auswahlkriterien

    Die Erweiterte Suche können Sie auch nutzen, ohne Suchbegriffe einzugeben. Sie orientiert sich dann an den Kriterien, die Sie ausgewählt haben (z. B. nach dem Land oder dem Sachgebiet).

    Haben Sie in einer Kategorie kein Kriterium ausgewählt, wird die gesamte Kategorie durchsucht (z.B. alle Sachgebiete oder alle Länder).