Researchers from Saarland University and the Max Planck Institute for Software Systems have, for the first time, shown that the reactions of humans and large language models (LLMs) to complex or misleading program code significantly align, by comparing brain activity of study participants with model uncertainty. Building on this, the team developed a data-driven method to automatically detect such confusing areas in code — a promising step toward better AI assistants for software development.
The team led by Sven Apel, Professor of Software Engineering at Saarland University, and Mariya Toneva, resarcher at the Max Planck Institute for Software Systems, investigated how humans and large language models respond to confusing program code. The characteristics of such code, known as atoms of confusion, are well studied: They are short, syntactically correct programming patterns that are misleading for humans and can throw even experienced developers off track.
To find out whether LLMs and humans “think” about the same stumbling blocks, the research team used an interdisciplinary approach: On the one hand, they used data from an earlier study by Apel and colleagues, in which participants read confusing and clean code variants while their brain activity and attention were measured using electroencephalography (EEG) and eye tracking. On the other hand, they analyzed the “confusion” or model uncertainty of LLMs using so-called perplexity values. Perplexity is an established metric for evaluating language models by quantifying their uncertainty in predicting sequences of text tokens based on their probability.
The result: Wherever humans got stuck on code, the LLM also showed increased perplexity. EEG signals from participants—especially the so-called late frontal positivity, which in language research is associated with unexpected sentence endings—rose precisely where the language model’s uncertainty spiked. “We were astounded that the peaks in brain activity and model uncertainty showed significant correlations,” says Youssef Abdelsalam, who was advised by Toneva and Apel and was instrumental in conducting the study as part of his doctoral studies.
Based on this similarity, the researchers developed a data-driven method that automatically detects and highlights unclear parts of code. In more than 60 percent of cases, the algorithm successfully identified known, manually annotated confusing patterns in the test code and even discovered more than 150 new, previously unrecognized patterns that also coincided with increased brain activity.
“With this work, we are taking a step toward a better understanding of the alignment between humans and machines,” says Max Planck researcher Mariya Toneva. “If we know when and why LLMs and humans stumble in the same places, we can develop tools that make code more understandable and significantly improve human–AI collaboration,” adds Professor Sven Apel.
Through their project, the researchers are building a bridge between neuroscience, software engineering, and artificial intelligence. The study, currently published as a preprint, was accepted for publication at the International Conference on Software Engineering (ICSE), one of the world’s leading conferences in the field of software development. The conference will take place in Rio de Janeiro in April 2026. The authors of the study are: Youssef Abdelsalam, Norman Peitek, Anna-Maria Maurer, Mariya Toneva, and Sven Apel.
Editorial contact:
Philipp Zapf-Schramm
Saarland Informatics Campus
Tel: +49 681 9325 4509
E-Mail: pzs@mpi-klsb.mpg.de
Prof. Dr. Sven Apel
Chair of Software Engineering
Saarland University
Tel.: +49 681 302 57211
E-mail: apel@cs.uni-saarland.de
Dr. Mariya Toneva
Head of the Research Group “Bridging AI and Neuroscience”
Max Planck Institute for Software Systems
Tel.: +49 681 9303 9801
E-mail: mtoneva@mpi-sws.org
Preprint: Y. Abdelsalam, N. Peitek, A.-M. Maurer, M. Toneva, S. Apel (2025): “How do Humans and LLMs Process Confusing Code?” arXiv:2508.18547v1 [cs.SE], August 25, 2025.
https://arxiv.org/abs/2508.18547
https://www.se.cs.uni-saarland.de/ - Chair of Software Engineering:
https://mtoneva.com/index.html - Max Planck research group “Bridging AI and Neuroscience”
Sven Apel, professor of computer science at Saarland University
Quelle: Oliver Dietze
Copyright: Universität des Saarlandes
Mariya Toneva, researcher at Max Planck Institute for Software Systems
Quelle: MPI-SWS
Copyright: MPI-SWS
Merkmale dieser Pressemitteilung:
Journalisten, Wirtschaftsvertreter, Wissenschaftler
Gesellschaft, Informationstechnik, Psychologie, Wirtschaft
überregional
Forschungsergebnisse, Wissenschaftliche Tagungen
Englisch

Sie können Suchbegriffe mit und, oder und / oder nicht verknüpfen, z. B. Philo nicht logie.
Verknüpfungen können Sie mit Klammern voneinander trennen, z. B. (Philo nicht logie) oder (Psycho und logie).
Zusammenhängende Worte werden als Wortgruppe gesucht, wenn Sie sie in Anführungsstriche setzen, z. B. „Bundesrepublik Deutschland“.
Die Erweiterte Suche können Sie auch nutzen, ohne Suchbegriffe einzugeben. Sie orientiert sich dann an den Kriterien, die Sie ausgewählt haben (z. B. nach dem Land oder dem Sachgebiet).
Haben Sie in einer Kategorie kein Kriterium ausgewählt, wird die gesamte Kategorie durchsucht (z.B. alle Sachgebiete oder alle Länder).