How do languages balance the richness of their structures with the need for efficient communication? To investigate, researchers at the Leibniz Institute for the German Language (IDS) in Mannheim, Germany, trained computational language models on more than 6,500 documents in over 2,000 languages. They found that languages that are computationally harder to process compensate for this increased complexity with greater efficiency: more complex languages need fewer symbols to encode the same message. The analyses also reveal that larger language communities tend to use more complex but more efficient languages.
Language models are computer algorithms that learn to process and generate language by analysing large amounts of text. They excel at identifying patterns without relying on predefined rules, making them valuable tools for linguistic research. Importantly, not all models are the same: their internal architectures vary, shaping how they learn and process language. These differences allow researchers to compare languages in new ways and uncover insights into linguistic diversity.
In a novel study, researchers at the IDS trained language models on a vast dataset of over 6,500 documents in more than 2,000 languages, covering almost 3 billion words. The texts included religious writings, legal documents, movie subtitles, newspaper articles, and a lot more. The researchers estimated how difficult it is for the computational models to process or produce text, using this as a measure of language complexity. “We trained very different language models on this textual material,” says co-author Sascha Wolfer. “Some simple models only consider the last two words, which limits their ability to capture grammatical patterns over long distances. Others, such as transformers (similar to ChatGPT), use advanced mechanisms to analyse complex dependencies and uncover richer linguistic structures.”
Surprisingly, the results were consistent: despite significant architectural differences, the models produced remarkably similar rankings of language complexity. “If one language is harder to process than another for one model in one corpus, this relationship holds across other models, text types, and even if the model operates on a different symbolic level, e.g. characters instead of words,” explains co-author Peter Meyer. “These findings suggest that the results may not only reflect computational effort but could also offer insights into the intrinsic complexity of human languages.”
Why, then, would some languages evolve to be more complex, given the increased effort required for processing? A key finding of the study may provide an answer: there is a trade-off between complexity and efficiency. Languages with higher complexity tend to produce shorter texts to convey the same content, reflecting a compensatory mechanism where increased structural intricacy is offset by greater efficiency in communication.
“So maybe the extra effort required to learn a complex language has its benefits,” suggests Alexander Koplenig, lead author of the study. “Once you’ve mastered it, a complex language might offer more options to express yourself, which can make it easier to convey the same idea using fewer symbols. This is relevant, because we also show that this trade-off is shaped by the social environments in which languages are used, with larger communities tending to use more complex but more efficient languages.”
So one could speculate that in large societies, institutionalised education might enable greater linguistic complexity by providing systematic and formalised language learning, which supports the acquisition and use of intricate linguistic structures. At the same time, the importance of written communication in larger societies may create pressure for shorter messages to reduce costs for production, storage, and transmission—such as book paper, storage space, or bandwidth. “This combination—education enabling complexity and practical needs driving efficiency—could explain why languages in larger communities evolve the way they do,” Koplenig continues. “Testing this speculative hypothesis is a fascinating direction for future research.”
The Leibniz Institute for the German Language (IDS) is the central extramural institute for research and documentation of the German language in its contemporary usage and in its recent history. It is one of over 90 research and service institutions of the Leibniz Association. For more details see: http://www.ids-mannheim.de, https://bsky.app/profile/idsmannheim.bsky.social, http://www.facebook.com/ids.mannheim, http://www.instagram.com/ids_mannheim/ and http://www.leibniz-gemeinschaft.de.
Dr. Sascha Wolfer
Leibniz Institute for the German Language
R 5, 6-13
D-68161 Mannheim
Tel.: +49 621 1581-439
Email: wolfer@ids-mannheim.de
Koplenig A., Wolfer S., Rüdiger J.-O., Meyer, P. (2025): Human languages trade off complexity against efficiency. PLOS Complex Systems 2(1): e0000032.
https://journals.plos.org/complexsystems/article?id=10.1371/journal.pcsy.0000032
Criteria of this press release:
Journalists
Language / literature
transregional, national
Research results
English
You can combine search terms with and, or and/or not, e.g. Philo not logy.
You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).
Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.
You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).
If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).