idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Grafik: idw-Logo

idw - Informationsdienst
Wissenschaft

idw-Abo

idw-News App:

AppStore

Google Play Store



Instance:
Share on: 
02/05/2026 10:03

Artificial Intelligence Accelerates Access to Insect Collections

Dr. Gesine Steiner Pressestelle
Museum für Naturkunde - Leibniz-Institut für Evolutions- und Biodiversitätsforschung

    Researchers at the Museum für Naturkunde Berlin, together with data scientists, have developed a new method to largely automate the extraction of label information from digitized insect specimens. The pipeline, named ELIE, uses artificial intelligence to reliably detect and process printed labels. This significantly reduces the time-consuming manual transcription work and represents an important advance for the digitization of natural history collections worldwide.

    With more than one million described species, insects represent the most diverse group of living organisms on Earth. Natural history collections worldwide house around 500 million insect specimens collected over the past three centuries. Each specimen carries labels containing essential information such as collection locality, date, and collector. These data form a crucial foundation for research in taxonomy, evolutionary biology, and ecology.
    Despite the availability of high-throughput digitization workflows for collection objects, the transcription of label information is still largely performed manually. Researchers at the Museum für Naturkunde Berlin, working closely with experts in digitization and data science, have now developed a new pipeline that substantially simplifies and accelerates this process.

    The pipeline, ELIE (“Entomological Label Information Extraction”), automates several steps of label processing. Using image analysis and machine learning techniques, ELIE detects individual labels in digital images, aligns them, and classifies them as either printed or handwritten. Printed labels are automatically processed using optical character recognition, while handwritten information is separated for targeted manual transcription. In addition, the system groups identical or highly similar labels, ensuring that recurring information only needs to be reviewed once.

    “With ELIE, we address one of the major bottlenecks in the digitization of entomological collections,” says Margot Belot, Data manager at the Museum für Naturkunde Berlin. “Automating the transcription of printed labels significantly relieves researchers and curators and allows us to make our collections available for scientific use more quickly and systematically.”

    The pipeline was tested, among other datasets, on 26,000 of the label images from the 650,000 insect specimens digitized at the MfN between 2022 and 2023 using a high-speed conveyor-based imaging system developed by the company Picturae. The results show that, depending on the degree of label redundancy, information from up to nearly 90 percent of printed labels can be extracted automatically. Further tests with datasets from the Smithsonian National Museum of Natural History in Washington, D.C., and the Museum of Comparative Zoology at Harvard University demonstrate that ELIE can be reliably applied to previously unseen collections.

    The results have been published in the journal Methods in Ecology and Evolution. The researchers see ELIE as an important building block for the future digitization of natural history collections and as a contribution to making these unique archives of biodiversity more accessible for research.


    Original publication:

    https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210x.70235


    Images

    Criteria of this press release:
    Journalists
    Biology, Information technology
    transregional, national
    Research results, Scientific Publications
    English


     

    Help

    Search / advanced search of the idw archives
    Combination of search terms

    You can combine search terms with and, or and/or not, e.g. Philo not logy.

    Brackets

    You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).

    Phrases

    Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.

    Selection criteria

    You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).

    If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).