Speech-to-Expression: Controlling Digital Head Avatars via Audio Signals

09.12.2025 12:30

Speech-to-Expression: Controlling Digital Head Avatars via Audio Signals

Philipp Zapf-Schramm Presse- und Öffentlichkeitsarbeit
Max-Planck-Institut für Informatik

Realistic digital avatars are becoming increasingly relevant, for example in virtual and augmented reality applications, video conferencing, films and computer games, or in medicine. Researchers at the Max Planck Institute (MPI) for Informatics in Saarbrücken, Germany, are now presenting two novel methods at two of the world’s leading computer graphics conferences, SIGGRAPH and SIGGRAPH Asia. These methods enable the generation of photorealistic full-body avatars and allow head avatars to be controlled using only audio tracks.

Previous methods for generating digital avatars have had significant limitations: The face and body often cannot be controlled independently, clothing sometimes looks unnatural, the renderings are often convincing only from certain perspectives, and facial animations frequently appear sterile and lifeless. With their works “EVA: Expressive Virtual Avatars from Multi-view Videos” and “Audio-Driven Universal Gaussian Head Avatars”, the Max Planck researchers are taking a step toward solving these problems.

The paper “Audio-Driven Universal Gaussian Head Avatars”, to be presented in December at SIGGRAPH Asia in Hong Kong, describes a method by which photorealistic 3D head avatars can be automatically animated and controlled using only voice recordings. The foundation of this is the newly developed Universal Head Avatar Prior (UHAP), a model pre-trained on a large number of video recordings of real people from a publicly available dataset. It can clearly distinguish between identity (the appearance of a specific person) and expression (facial expressions and movements).

An audio-encoder then translates audio signals directly into the expression renderings of the digital avatar model. Unlike earlier approaches, it takes into account not only lip and jaw movements but also fine, audio-dependent changes such as movements inside the mouth or subtle facial expressions. Using this pre-trained model, highly realistic 3D facial renderings can be generated with significantly less data. “Our goal is to create digital heads that not only synchronize with speech, but also behave lifelike, incorporating subtle details such as eyebrow movements and gaze shifts,” says Kartik Teotia, a doctoral student at Saarland University conducting research at MPI for Informatics.

In addition to faces, the research at MPI for Informatics also covers methods for generating full-body avatars. The paper “EVA: Expressive Virtual Avatars from Multi-view Videos”, published in August at the SIGGRAPH conference in Vancouver, describes a novel approach in which the modeling of motion and appearance are separated. A flexible digital model first captures the body, hands, and face, along with their movements and expressions. A second layer then adds the external appearance, that is, skin, hair, and clothing. “With EVA, we can realistically generate movements and facial expressions independently of one another, and also render them from new viewpoints that were not inlcuded in the original recordings,” says Marc Habermann, head of the research group Graphics and Vision for Digital Humans at MPI for Informatics. For now, the system still requires training with recordings from a lab facility at the Institute, where a person is filmed from more than one hundred camera perspectives simultaneously.

“With these two works, we are advancing research on realistic digital avatars in a decisive way. Such models could fundamentally change how we communicate, collaborate, or acquire new skills in the future, for example through virtual tutors, extending far beyond computer science,” says Professor Christian Theobalt, Director at the Max Planck Institute for Informatics and head of the Visual Computing and Artificial Intelligence department, where these projects are being developed. Theobalt is also founding director of the Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence (VIA), a strategic research partnership with Google.

Both of the above-mentioned works have already attracted interest from industry. “EVA: Expressive Virtual Avatars from Multi-view Videos” was developed in collaboration with Google at the Saarbrücken VIA Center. “Audio-Driven Universal Gaussian Head Avatars” was developed with scientific collaboration from Flawless AI, a London-based film technology company recently named one of TIME Magazine’s 100 Most Influential Companies of 2025. Flawless AI’s Visual Dubbing technology, built on foundational research pioneered by Theobalt’s department, enables actors’ lip movements to be precisely adapted for new languages, a breakthrough that is drawing growing attention across Hollywood. In May 2025, the first full-length feature reworked with Visual Dubbing, Watch the Skies, was released in U.S. cinemas.

Press contact and editor:
Philipp Zapf-Schramm
Max Planck Institute for Informatics
Phone: +49 681 9325 4509
Email: pzs@mpi-inf.mpg.de

Wissenschaftliche Ansprechpartner:

Prof. Dr. Christian Theobalt
Director, Department „Visual Computing and Artificial Intelligence“
Max Planck Institute for Informatics
Mail: d6-sek@mpi-inf.mpg.de
Tel: +49 681 9325 4500

Dr. Marc Habermann
Group leader, Group „Graphics and Vision for Digital Humans“
Max Planck Institute for Informatics
Mail: mhaberma@mpi-inf.mpg.de
Tel: +49 681 9325 4507

Originalpublikation:

Kartik Teotia, Helge Rhodin, Mohit Mendiratta, Hyeongwoo Kim, Marc Habermann, and Christian Theobalt. 2025. Audio-Driven Universal Gaussian Head Avatars. In SIGGRAPH Asia 2025 Conference Papers December 15–18, 2025, Hong Kong, Hong Kong. ACM, New York,NY, USA, 16 pages. https://doi.org/10.48550/arXiv.2509.18924

Hendrik Junkawitsch, Guoxing Sun, Heming Zhu, Christian Theobalt, and Marc Habermann. 2025. EVA: Expressive Virtual Avatars from Multi-view Videos.In Special Interest Group on Computer Graphics and Interactive TechniquesConference Conference Papers (SIGGRAPH Conference Papers ’25), August10–14, 2025, Vancouver, BC, Canada. ACM, New York, NY, USA, 20 pages. https://doi.org/10.1145/3721238.3730677

Weitere Informationen:

https://www.mpi-inf.mpg.de/de/departments/visual-computing-and-artificial-intell... Department Visual Computing and Artificial Intelligence
https://gvdh.mpi-inf.mpg.de/index.html Graphics and Vision for Digital Humans group
https://www.via-center.science/ Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence

Bilder

Dr. Marc Haberman, Kartik Teotia and Prof. Dr. Christian Theobalt (left to right) standing in the Mu ...
Quelle: Philipp Zapf-Schramm
Copyright: Max Planck Institute for Informatics

Merkmale dieser Pressemitteilung:
Journalisten
Informationstechnik
überregional
Forschungsergebnisse
Englisch

idw – Informationsdienst Wissenschaft

idw-News App:

Speech-to-Expression: Controlling Digital Head Avatars via Audio Signals

Philipp Zapf-Schramm Presse- und Öffentlichkeitsarbeit
Max-Planck-Institut für Informatik

Wissenschaftliche Ansprechpartner:

Originalpublikation:

Weitere Informationen:

idw-News App:

Speech-to-Expression: Controlling Digital Head Avatars via Audio Signals

Philipp Zapf-Schramm Presse- und Öffentlichkeitsarbeit Max-Planck-Institut für Informatik

Wissenschaftliche Ansprechpartner:

Originalpublikation:

Weitere Informationen:

Erweiterte Suche

Umfang der Suche

Datum der Veröffentlichung

Hilfe

Die Suche / Erweiterte Suche im idw-Archiv

Verknüpfungen

Klammern

Wortgruppen

Auswahlkriterien

Philipp Zapf-Schramm Presse- und Öffentlichkeitsarbeit
Max-Planck-Institut für Informatik