idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Science Video Project

idw-News App:


Google Play Store

Share on: 
04/19/2021 15:42

Human voices from the computer - barely distinguishable from the original

Rainer Krauß Hochschulkommunikation
Hochschule Hof - University of Applied Sciences

    Hof - Especially for blind or visually impaired people, computer applications that read texts aloud are already a great help in everyday life. Even when driving, people have long since become accustomed to the friendly voices from the navigation system, which save drivers from dangerous distractions. Naturally, the new technology also harbors dangers. The Institute for Information Systems at Hof University of Applied Sciences is conducting a study to determine the acceptance of artificially generated voices and is developing its own models for the German market.

    The quality of so-called speech synthesis has improved considerably in recent years. Whereas for a long time voices sounded rather tinny or choppy, the sound is gradually giving way to an increasing naturalness and unobtrusive speech dynamics. This also makes listening to longer texts more pleasant.

    Rapid improvement in speech quality
    "This has been achieved in international research through the use of deep neural networks. In the English-speaking world in particular, it is already almost impossible to distinguish between a real person and a program," says Prof. Dr. Rene Peinl, Head of the Institute for Information Systems at Hof University of Applied Sciences. Accordingly, a number of freely available models are available that speak English very naturally if sufficient training data is used. Speech generation usually takes place in two stages. First, a so-called Mel spectrogram is generated, which is a representation of the speech frequencies. From this, a vocoder then generates the actual audio signal. Both stages are neural networks that must be trained separately.

    Acceptance on the test bench
    The DAMMIT program at Hof University of Applied Sciences, which focuses on the technology transfer between universities and small and medium-sized enterprises for digital transformation, is analyzing how high user acceptance is for computer-generated voices. Test subjects are read text content of medium length - for example, messages half a screen page long. The steady improvement in the quality of speech synthesis that has taken place in recent years increases the convenience and possible uses of the technology on the one hand, but also harbors dangers on the other, since machine voices that sound human can of course also be used for fraud or criminal acts.

    Many possible applications
    Automated text reading aloud is currently being found in more and more areas of application. Being able to take in information even though the eyes have to focus on another target is an invaluable advantage: "Speech synthesis is of course an essential part of accessibility for people with visual impairments. In very practical terms, however, orders can be verbalized for forklift drivers, among others, which can be very helpful and time-saving in their workflow. Or one can have the daily news read aloud in one's personal favorite voice. In general, speech synthesis is also an important part of voice-controlled applications such as smart speakers, e.g. Amazon's Alexa," says Prof. Dr. Peinl, explaining some of the possible applications.

    Market demand is growing
    Yet the demand for automatically generated, but human-sounding voices, is likely only just at the beginning. One example can be found on the campus of Hof University of Applied Sciences at the Einstein 1 start-up center: The start-up company ahearo offers a service that allows people to listen to audio podcasts of content that is otherwise only available as text. Until now, these texts have been read in by human speakers. "Such a production is of course cost-intensive and also reaches its limits due to the limited availability of professional speakers. The collaboration with Hof University of Applied Sciences therefore opens up completely new possibilities for us," says Johannes Garbarek, founder and CEO of ahearo.

    High speed and low cost
    "For ahearo and other companies looking for a cost-effective and fast way to incorporate high-quality speech synthesis into their products, we are developing a solution for generating German speech from text," said Prof. Dr. Peinl. Freely available, self-generated audio data provided by ahearo is used to train the speech synthesis models in the best possible way. The evaluation is based on objectively measurable parameters as well as on subjective assessments of the test persons.

    Encouraging interim results
    The results obtained so far are encouraging and give reason to hope that the software will soon be used in practice: "Short sentences are already read out very well in our model. The challenges are still pauses and stresses in more complex sentences, as well as abbreviations, compound words and proper names," explains researcher Peinl. A small anecdote shows that the computer program sometimes has the same problems as humans: "For example, we have the word "early summer meningoencephalitis (FSME)" in our test texts. It is no wonder that not only we, but also the computers, have difficulties with such word monstrosities," says Professor Dr. Peinl.

    The results of the study, as well as the software developed in the course of the research, will be freely published and made accessible. The project is funded by the ERDF Bavaria 2014-2020 program, by the European Union through the Regional Development Fund, and by the Bavarian State Ministry of Science and the Arts. Another project partner is smartlytic GmbH, a software development and data analysis company based on Hof University campus.

    Contact for scientific information:

    Prof. Dr. René Peinl

    Master Internet - Web Science

    Hochschule Hof
    Alfons-Goppel-Platz 1
    95028 Hof
    Fon: +49 (0) 9281 / 409 4820


    Prof. Dr. Rene Peinl
    Prof. Dr. Rene Peinl
    Hochschule Hof

    attachment icon Institute for Information Systems successfully researches speech synthesis

    Criteria of this press release:
    Business and commerce, Journalists, Scientists and scholars, Students, Teachers and pupils, all interested persons
    Economics / business administration, Information technology, Traffic / transport
    transregional, national
    Research projects, Research results


    Prof. Dr. Rene Peinl

    For download



    Search / advanced search of the idw archives
    Combination of search terms

    You can combine search terms with and, or and/or not, e.g. Philo not logy.


    You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).


    Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.

    Selection criteria

    You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).

    If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).