Online offers by universities are mostly limited to playing back videos of lecturers’ presentations. There aren’t any opportunities for direct interaction. Scientists at the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI, want to change that with their VoluProf project. This project involves 32 cameras recording the lecturer’s presentation from all angles. The video data is used to generate a photo-realistic animated avatar that appears lifelike via AR glasses, and can even answer questions.
Almost all universities of applied sciences and colleges also offer their lectures and courses online. Students can access learning content from anywhere at any time, and the number of participants isn’t limited to the size of the lecture hall. Until now, however, this service has been limited to passive viewing of video recordings.
Now, the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI in Berlin has developed a solution that enables online lectures or courses to be designed in an interactive and customized manner for participants. The lecturer appears in photo-realistic quality as a moving and talking avatar who addresses his or her audience personally, and even responds to questions. “The universities’ online lectures will be taken to a new level in terms of visual, acoustic, and didactic quality,” says Dr. Cornelius Hellge, Head of Multimedia Communications Group and leader of the project.
Photo and audio-realistic avatar
To generate the avatar, the first step is for the presenter to stand in a rotunda. This is equipped with stereo microphones and a total of 32 video cameras. Now, the lecturer can give their talk the way they want, moving and gesticulating as freely as they would like. An animated 3D avatar in photo-realistic quality is created from the video footage of the lecture. Characteristic movements are also included. Since the volumetric aspect of each person’s body is also included, the videos are referred to as “volumetric videos,” hence the name of the project: VoluProf (volumetric professor).
The lecturer then provides the lecture notes in text form. On one hand, this serves as a basis for the audio script, in which the text is reproduced with the lecturer’s voice. On the other hand, the lecture notes serve as a basis for the 3D avatar’s animation, both for the facial expressions that match the text and for the appropriate body movements. The same applies to the voice, which is synchronized with the lip movements.
In order to participate in the virtual courses, all students need is a smartphone and AR (augmented reality) glasses. With the help of the glasses, the animated 3D avatar is superimposed directly into the real-life environment. For the students, it looks as if the lecturer is standing in front of them in the room. In addition, the participants’ position in the room and their line of vision can be detected by the glasses. This information is transmitted via smartphone to the provider’s server, where the avatar’s image is continuously rendered so that it faces the students at all times, appearing to speak directly to them. The avatar’s movements and the reaction to the participants’ input take place almost in real time; the latency is a maximum of 40 ms. “You get the impression that the lecturer is giving a one-to-one lecture to the student in question,” Hellge explains.
Interaction through questions and dialog
In contrast to traditional, passive online lectures, interaction with the lecturer is possible at any time with VoluProf. For example, a student could ask: “Can you repeat that, please?” or “I didn’t understand that.” These kinds of questions are stored as commands in the avatar’s neural network and then trigger either repetitions or a more detailed explanation by the avatar.
However, the students can also ask more specific questions. Since all conceivable questions can be saved in the lecture text in advance, the virtual professor can respond to the respective question. To achieve this, the system’s speech recognition function converts the spoken question into text. An AI-based chatbot then links the question text with the matching answer text, which is spoken by the virtual professor — including gestures, facial expressions, and synchronized lip movements.
Despite the solution’s technical sophistication, it places only low demands on the end user’s equipment. Since computationally intense tasks such as animation and rendering, audio synthesis, or speech recognition take place on the provider’s server, participants only need a standard smartphone that supports at least 4G and a pair of lightweight AR glasses. “We deliberately designed the concept to keep the barrier to entry for students as low as possible,” explains Hellge.
Expertise in graphics, audio, and video codecs
For VoluProf, Fraunhofer HHI has leveraged its long-standing expertise in the areas of computer vision, video work, and machine learning. One team designed the photorealistic representation of people as avatars using volumetric video data. Another team took care of the efficient transmission of video data. Researchers developed a transmission method specifically for this purpose that guarantees low latency, while adapting to different network conditions and enabling smooth motion at reduced resolution even when the connection is poor. Hellge emphasizes the whole project’s innovative approach, which incorporates these technologies: “Photorealistic avatars are already well-known from the movies. But what’s new and unique about VoluProf is that the photo-realistic avatars interact with people and answer questions in real time via online connection.”
Hellge, a Fraunhofer researcher, initiated the project, has taken qualified technology partners on board, and is driving its further development as project manager. Initial trials at the University of Rostock have already taken place. “The feedback has been very positive and there’s a lot of interest in the final implementation,” Hellge is happy to report.
https://www.fraunhofer.de/en/press/research-news/2023/august-2023/voluprof-facil...
The rotunda is equipped with 32 cameras that film the lecturer from all sides.
© Fraunhofer HHI
Criteria of this press release:
Journalists
Electrical engineering, Information technology, Mathematics
transregional, national
Cooperation agreements, Research projects
English
You can combine search terms with and, or and/or not, e.g. Philo not logy.
You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).
Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.
You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).
If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).