idw - Informationsdienst
Wissenschaft
Hof – Today, artificial intelligence can describe images, recognize objects, and explain complex relationships. The pace of development is remarkable: so-called vision-language models (VLMs) combine text and image understanding in impressive ways. Yet, of all things, they struggle with a seemingly simple task—counting. Researchers at the Institute for Information Systems (iisys) at Hof University of Applied Sciences are now working to address this issue.
“Many common models are very good at recognizing what can be seen in an image—but not reliably how many objects there are,” explains Prof. Dr. René Peinl from the Institute for Information Systems (iisys) at Hof University of Applied Sciences. Errors become more frequent when there are more than four or five objects of the same type.
Why Counting Is So Difficult for AI
The problem runs deeper than it may appear at first glance. While humans can intuitively grasp small quantities, larger numbers must be actively counted. This crucial step is missing in many AI models. In addition, existing training data is often unsuitable. “Some datasets are too simple and only encourage pattern recognition—others are too complex or flawed, for example due to occluded objects or ambiguous questions,” says institute director Prof. Peinl. As a result, models tend to “guess” or rely on learned expectations—sometimes producing surprisingly incorrect results.
The Solution from Hof: An Artificial Dataset
To tackle this problem in a targeted way, iisys has developed the SITUATE dataset. Instead of using real photographs, the researchers generate artificial 3D scenes with clearly defined properties. “We wanted to create an environment in which we can precisely control what happens in the image—and what does not,” says Prof. Dr. René Peinl. These scenes contain geometric objects such as cubes, spheres, or cylinders, with clearly defined positions (e.g., “to the left of the table”), allowing for targeted questions—for instance about color, quantity, or location. This creates a training environment that is not based on chance, but specifically designed to develop certain capabilities.
Learning Through Structure Rather Than Chance
A key aspect of the project is how the AI learns to count. In addition to simple answers, the researchers use detailed explanations in which the AI describes step by step what it sees and how it counts. For example: “There are two objects on the table and three next to it—so five in total.” This so-called “chain-of-thought” approach proves effective—at least for larger numbers. “We see that models improve significantly on more complex counting tasks through this structured approach,” says Peinl. However, this method also has limitations: for small numbers, the AI tends to “hallucinate” additional objects in order to stay consistent with its own reasoning.
Better Results—and New Insights
The experiments clearly show that AI models trained with SITUATE generalize better. “A combination of different datasets yields the best results in our tests. But we also see that the type of training strongly influences how the AI ‘thinks.’ What’s particularly interesting is that the models display behavioral patterns reminiscent of humans. Small quantities are quickly recognized, while larger ones require structured strategies,” says Prof. Peinl. At the same time, it becomes clear that AI often does not develop a true concept of numbers, but instead learns visual patterns.
Implications for the Future of AI
The research from Hof also demonstrates that progress in artificial intelligence does not depend solely on ever-larger models—but above all on better data and well-designed training methods. “Our dataset shows that it is possible to work specifically on the weaknesses of models, and that synthetic—i.e., computer-generated—data is not inherently inferior,” emphasizes Peinl.
A Building Block for More Reliable AI Systems
Whether in industry, medicine, or logistics—many applications depend on AI not only recognizing objects but also counting them accurately and interpreting them correctly. With SITUATE, the iisys at Hof University of Applied Sciences is making an important contribution to improving exactly these capabilities. Following the success of the initial tests, a second, much more diverse dataset is currently being developed to enable even more differentiated counting strategies.
Prof. Dr. René Peinl,
Institut für Informationssysteme der Hochschule Hof (iisys)
Criteria of this press release:
Business and commerce, Journalists, Scientists and scholars, Students, Teachers and pupils, all interested persons
Economics / business administration, Information technology, Media and communication sciences
transregional, national
Research projects, Research results
English

You can combine search terms with and, or and/or not, e.g. Philo not logy.
You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).
Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.
You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).
If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).