idw - Informationsdienst
Wissenschaft
Learning scientific writing requires precise feedback, which is a particular challenge in very large courses with many participants. In the Department of Computer Science at TU Darmstadt, the AI assistance system ‘LLMentor’ is therefore being used for the first time to support the assessment of students’ exposés and peer reviews for final theses. The tool reduces the workload for lecturers and promotes consistent feedback – without taking decisions out of the lecturer’s hands.
Academic writing is one of the skills that students find most difficult to learn ‘on the side’. Especially in bachelor's programmes, precise and specific feedback is needed to turn initial drafts into robust exposés for final theses and peer reviews – feedback from fellow students – into genuinely helpful guidance. In practice, this is extremely challenging: in large courses, a vast number of texts have to be carefully read and evaluated according to uniform criteria every semester, and on top of that, there is assessment of the peer reviews.
This is also the case for the instructors of the course ‘Introduction to Scientific Work’ led by Professor Iryna Gurevych and Dr Thomas Arnold at the Department of Computer Science at TU Darmstadt: to ensure quality, the instructors regularly hire and train numerous tutors, but even then it remains difficult to keep the assessment consistent across a large team of teaching assistants. Since the 2025/26 winter semester, the instructors have deployed their new AI-based assistance ‘LLMentor’ to help them assess and formulate feedback on student exposés, which serve as preparation for the students’ final theses.
‘LLMentor’ is not an automatic grading machine, but rather a AI-based decision support tool: Based on transparent assessment rubrics, the system makes suggestions, such as preliminary scores for each criterion, brief explanations and feedback. These suggestions can be accepted, adapted or rejected by the course instructors. The responsibility for assessment and feedback remains entirely with the humans. Before being used in the course, the tool was tested and scientifically evaluated.
LLMentor is integrated into the established CARE framework, which was used in the course even before the introduction of AI. CARE is the central course platform through which the entire process is organised: students submit their exposés, provide peer reviews and receive feedback, while the course instructors provide grading and free-text feedback via the same environment. The novelty is that AI suggestions from LLMentor are now displayed at appropriate points within the existing workflow. The structure of the process remains the same, but the course instructors and teaching assistants receive additional support precisely where the effort and error rate are particularly high in everyday work.
Gurevych, Arnold, and Dennis Zyska from the Ubiquitous Knowledge Processing (UKP) Lab and Professor Florian Müller from the Mobile Human-Computer Interaction group in the Department of Computer Science are responsible for development and scientific support. In the 2025/26 winter semester, the implementation of the project was supported by hessian.AI – the Hessian Centre for Artificial Intelligence.
The Exposía dataset, which Zyska from the UKP Lab has made publicly available, is the scientific foundation of the project. Exposía documents the entire course process, from the draft exposé to comments and reviews of the revised final exposé version, thus enabling systematic evaluations. Additional research was conducted to determine where pre-trained AI models can reliably support grading and assessment and where they cannot.
Put simply, the comparison with human evaluations shows that AI works particularly well with rather clear, formal criteria. Agreement decreases for criteria that require deeper content expertise. This is also the area in which humans tend to provide inconsistent grading.
“In teaching, we are dealing with very high student numbers, and at the same time, good feedback and fair, consistent assessment of academic texts are extremely time-consuming,” explains Gurevych. “This applies to the exposés themselves, but also to the peer-reviews, i.e. the feedback that students give each other. Every semester, we invest a lot of time in finding, training and coordinating numerous tutors, and yet it is difficult to ensure consistent quality across many teaching assistants. LLMentor aims to support this by providing suggestions that remain transparent and always have to be checked by the teaching team. Our goal is not automation, but rather to reduce the workload and increase consistency so that we can focus more on what teaching is all about: learning through good feedback.”
Prof. Dr. Iryna Gurevych
UKP Lab
iryna.gurevych@tu-darmstadt.de
+49 6151 16-25290
Dennis Zyska, Alla Rozovskaya, Ilia Kuznetsov, Iryna Gurevych: Exposía: Academic Writing Assessment of Exposés and Peer Feedback;
arXiv:2601.06536
https://doi.org/10.48550/arXiv.2601.06536
Criteria of this press release:
Journalists
Information technology, Teaching / education
transregional, national
Scientific Publications, Studies and teaching
English

You can combine search terms with and, or and/or not, e.g. Philo not logy.
You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).
Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.
You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).
If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).