idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Grafik: idw-Logo

idw - Informationsdienst
Wissenschaft

idw-Abo

idw-News App:

AppStore

Google Play Store



Instance:
Share on: 
11/27/2025 10:18

Humans and large language models respond surprisingly similarly to confusing program code

Friederike Meyer zu Tittingdorf Pressestelle der Universität des Saarlandes
Universität des Saarlandes

    Researchers from Saarland University and the Max Planck Institute for Software Systems have, for the first time, shown that the reactions of humans and large language models (LLMs) to complex or misleading program code significantly align, by comparing brain activity of study participants with model uncertainty. Building on this, the team developed a data-driven method to automatically detect such confusing areas in code — a promising step toward better AI assistants for software development.

    The team led by Sven Apel, Professor of Software Engineering at Saarland University, and Mariya Toneva, resarcher at the Max Planck Institute for Software Systems, investigated how humans and large language models respond to confusing program code. The characteristics of such code, known as atoms of confusion, are well studied: They are short, syntactically correct programming patterns that are misleading for humans and can throw even experienced developers off track.

    To find out whether LLMs and humans “think” about the same stumbling blocks, the research team used an interdisciplinary approach: On the one hand, they used data from an earlier study by Apel and colleagues, in which participants read confusing and clean code variants while their brain activity and attention were measured using electroencephalography (EEG) and eye tracking. On the other hand, they analyzed the “confusion” or model uncertainty of LLMs using so-called perplexity values. Perplexity is an established metric for evaluating language models by quantifying their uncertainty in predicting sequences of text tokens based on their probability.

    The result: Wherever humans got stuck on code, the LLM also showed increased perplexity. EEG signals from participants—especially the so-called late frontal positivity, which in language research is associated with unexpected sentence endings—rose precisely where the language model’s uncertainty spiked. “We were astounded that the peaks in brain activity and model uncertainty showed significant correlations,” says Youssef Abdelsalam, who was advised by Toneva and Apel and was instrumental in conducting the study as part of his doctoral studies.

    Based on this similarity, the researchers developed a data-driven method that automatically detects and highlights unclear parts of code. In more than 60 percent of cases, the algorithm successfully identified known, manually annotated confusing patterns in the test code and even discovered more than 150 new, previously unrecognized patterns that also coincided with increased brain activity.

    “With this work, we are taking a step toward a better understanding of the alignment between humans and machines,” says Max Planck researcher Mariya Toneva. “If we know when and why LLMs and humans stumble in the same places, we can develop tools that make code more understandable and significantly improve human–AI collaboration,” adds Professor Sven Apel.

    Through their project, the researchers are building a bridge between neuroscience, software engineering, and artificial intelligence. The study, currently published as a preprint, was accepted for publication at the International Conference on Software Engineering (ICSE), one of the world’s leading conferences in the field of software development. The conference will take place in Rio de Janeiro in April 2026. The authors of the study are: Youssef Abdelsalam, Norman Peitek, Anna-Maria Maurer, Mariya Toneva, and Sven Apel.

    Editorial contact:
    Philipp Zapf-Schramm
    Saarland Informatics Campus
    Tel: +49 681 9325 4509
    E-Mail: pzs@mpi-klsb.mpg.de


    Contact for scientific information:

    Prof. Dr. Sven Apel
    Chair of Software Engineering
    Saarland University
    Tel.: +49 681 302 57211
    E-mail: apel@cs.uni-saarland.de

    Dr. Mariya Toneva
    Head of the Research Group “Bridging AI and Neuroscience”
    Max Planck Institute for Software Systems
    Tel.: +49 681 9303 9801
    E-mail: mtoneva@mpi-sws.org


    Original publication:

    Preprint: Y. Abdelsalam, N. Peitek, A.-M. Maurer, M. Toneva, S. Apel (2025): “How do Humans and LLMs Process Confusing Code?” arXiv:2508.18547v1 [cs.SE], August 25, 2025.
    https://arxiv.org/abs/2508.18547


    More information:

    https://www.se.cs.uni-saarland.de/ - Chair of Software Engineering:
    https://mtoneva.com/index.html - Max Planck research group “Bridging AI and Neuroscience”


    Images

    Sven Apel, professor of computer science at Saarland University
    Sven Apel, professor of computer science at Saarland University
    Source: Oliver Dietze
    Copyright: Universität des Saarlandes

    Mariya Toneva, researcher at Max Planck Institute for Software Systems
    Mariya Toneva, researcher at Max Planck Institute for Software Systems
    Source: MPI-SWS
    Copyright: MPI-SWS


    Criteria of this press release:
    Business and commerce, Journalists, Scientists and scholars
    Economics / business administration, Information technology, Psychology, Social studies
    transregional, national
    Research results, Scientific conferences
    English


     

    Help

    Search / advanced search of the idw archives
    Combination of search terms

    You can combine search terms with and, or and/or not, e.g. Philo not logy.

    Brackets

    You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).

    Phrases

    Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.

    Selection criteria

    You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).

    If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).