idw - Informationsdienst
Wissenschaft
With recent advances in Artificial Intelligence (AI), synthetic voices have become increasingly prevalent in our everyday soundscape, raising the question: Can AI voices still be distinguished from human voices, and how attractive do they sound? Researchers from the Max Planck Institute for Empirical Aesthetics (MPIEA) in Frankfurt am Main, Germany, and the University of Applied Arts Vienna, Austria, found that, while synthetic voices are often mistaken for human voices, they are perceived as less attractive on average. The study’s results were recently published in the journal Computers in Human Behavior: Artificial Humans.
A total of 75 people took part in the online study. Participants listened to different versions of a sentence spoken by eight voices. Four of the voices were human, and four were artificially generated Text-To-Speech (TTS) voices. Each voice presented the sentence in four expressed emotions: neutral, happy, sad, or angry. Participants rated the voices’ attractiveness and indicated how much they would like to interact with them. They also described the emotion they perceived in each case.
First author Camila Bruder of the MPIEA states: “Overall, happy-sounding voices were rated more positively than those perceived as sad or angry, regardless of whether they were human or artificially generated. This suggests that perceived emotion influences the evaluation of all voices similarly, or that AI voices are handled in a similar way as human ones.”
Participants were also asked to classify each voice as human or AI-generated. Human voices were identified correctly 86 percent of the time, while AI voices were only identified correctly 55 percent of the time. The most obvious misjudgments occurred with AI voices that were perceived as angry. This may be because participants expected “emotionless” synthetic voices.
Furthermore, age was also found to play a role in the assessment. Older participants had greater difficulty distinguishing between human and AI-generated voices. However, the fact that most participants were “fooled” by the TTS voices indicates significant progress in the expressiveness and naturalness of these systems.
Senior author Pauline Larrouy-Maestri of the MPIEA concludes: “Overall, human voices were perceived as more attractive and socially appealing than synthetic ones. However, there were significant individual differences in the assessment. This result highlights the need for further studies with more nuanced evaluation methods and further exploration of listener diversity to reflect the complexity of human voice perception.”
Max Planck Institute for Empirical Aesthetics
Dr. Camila Bruder: camila.bruder@ae.mpg.de
Pauline Larrouy-Maestri, PhD: plm@ae.mpg.de
Bruder, C., Breda, P., & Larrouy-Maestri, P. (2025). Attractive Synthetic Voices. Computers in Human Behavior: Artificial Humans, 6, Article 100211. https://doi.org/10.1016/j.chbah.2025.100211
Although AI-generated voices are often mistaken for human ones, they are generally perceived as less ...
Copyright: (Illustration: MPIEA / L. Bittner)
Merkmale dieser Pressemitteilung:
Journalisten, Wissenschaftler, jedermann
Informationstechnik, Medien- und Kommunikationswissenschaften, Psychologie, Sprache / Literatur
überregional
Forschungsergebnisse, Wissenschaftliche Publikationen
Englisch
Sie können Suchbegriffe mit und, oder und / oder nicht verknüpfen, z. B. Philo nicht logie.
Verknüpfungen können Sie mit Klammern voneinander trennen, z. B. (Philo nicht logie) oder (Psycho und logie).
Zusammenhängende Worte werden als Wortgruppe gesucht, wenn Sie sie in Anführungsstriche setzen, z. B. „Bundesrepublik Deutschland“.
Die Erweiterte Suche können Sie auch nutzen, ohne Suchbegriffe einzugeben. Sie orientiert sich dann an den Kriterien, die Sie ausgewählt haben (z. B. nach dem Land oder dem Sachgebiet).
Haben Sie in einer Kategorie kein Kriterium ausgewählt, wird die gesamte Kategorie durchsucht (z.B. alle Sachgebiete oder alle Länder).