idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Science Video Project
idw-Abo

idw-News App:

AppStore

Google Play Store



Instance:
Share on: 
12/12/2024 10:30

Graz Language Database Improves Automatic Speech Recognition of Austrian German

Falko Schoklitsch Kommunikation und Marketing
Technische Universität Graz

    With the “Graz corpus of read and spontaneous speech”, researchers at TU Graz have developed new methods for speech recognition of Austrian German using speech data from 38 people.

    Second-language speakers who come to Austria with a good knowledge of German usually find it difficult to understand the local dialects. Similarly, speech recognition systems often fail to decode regionally accented word choice and pronunciation. Barbara Schuppler from the Signal Processing and Speech Communication Laboratory at Graz University of Technology (TU Graz), together with researchers from the Know Center and the University of Graz, has investigated the complexity of conversational speech, built up a database of conversations in Austrian German and gained new knowledge about how to improve speech recognition. The results were recently published in the paper “What’s so complex about conversational speech?” in the journal Computer Speech & Language. The project was funded by the Austrian Science Fund FWF.

    Free-flowing conversations in the recording studio

    One of the main aims of the project was to improve the accuracy of automatic speech recognition (ASR) systems in spontaneous conversations with speakers from Austria. The team focused on the challenges posed by spontaneity, short sentences, overlapping speakers and dialectal accent in everyday conversations. In order to have a suitable database, the researchers set up the GRASS database (Graz corpus of read and spontaneous speech). It contains recordings of 38 speakers, which include both read texts and spontaneous conversations in which two people who knew each other well spoke freely for an hour in the recording studio without being given a topic. Since the same speakers were recorded in both speaking styles, the research team was able to eliminate the influence of speaker identity and recording quality on ASR performance.

    Based on the database, the team compared various ASR architectures, including the long-established HMM models (hidden Markov models) and the relatively new transformer-based models. This showed that transformer-based models, such as the Whisper speech recognition system, work very well for longer sentences with a lot of context, but have problems with short, fragmentary sentences that frequently occur in conversations. Traditional HMM-based systems that were explicitly trained with pronunciation variations proved to be more robust for short sentences and dialectal language. The researchers therefore want to pursue a hybrid system approach that combines the strengths of both architectures. They have already combined a transformer model with a knowledge-based lexicon and a statistical language model, thereby achieving significant improvements.

    Possible use in medical diagnostics

    The team also analysed how characteristics such as speech rate, intonation and word choice influence the accuracy of speech recognition. These findings can contribute to the development of ASR systems that better understand human speech in all its nuances. The team plans to continue research in these areas and incorporate the findings into the development of new, more robust speech recognition systems. However, the results of the project also have interesting potential applications beyond this, particularly in the fields of medical diagnostics and human-computer interaction. In the future, ASR systems could be used to recognise dementia or epilepsy based on speech patterns in spontaneous conversations or to make interaction with social robots more natural.

    “Spontaneous speech, especially in dialogue, has completely different characteristics compared to a recited or read speech,” says Barbara Schuppler. “By analysing human-human communication in particular, we have gained important findings in our project that also help us technically and open up new areas of application. Together with partners from the PMU Salzburg, Med Uni Graz and Med Uni Vienna, we are already working on follow-up projects to create socially relevant applications based on the foundations we have created in the Austrian Science Fund project.”


    Contact for scientific information:

    Barbara SCHUPPLER
    Ass.Prof. Mag.rer.nat. Dr.
    TU Graz | Signal Processing and Speech Communication Laboratory
    Phone: +43 316 873 4366
    b.schuppler@tugraz.at


    Original publication:

    What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures https://doi.org/10.1016/j.csl.2024.101738


    Images

    Spontaneity, short sentences, overlapping speakers and dialectal colouring cause problems for speech recognition systems.
    Spontaneity, short sentences, overlapping speakers and dialectal colouring cause problems for speech ...
    andreusK
    andreusK/Adobe Stock


    Criteria of this press release:
    Journalists, all interested persons
    Information technology, Language / literature
    transregional, national
    Research results
    English


     

    Spontaneity, short sentences, overlapping speakers and dialectal colouring cause problems for speech recognition systems.


    For download

    x

    Help

    Search / advanced search of the idw archives
    Combination of search terms

    You can combine search terms with and, or and/or not, e.g. Philo not logy.

    Brackets

    You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).

    Phrases

    Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.

    Selection criteria

    You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).

    If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).