idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Grafik: idw-Logo

idw - Informationsdienst
Wissenschaft

idw-Abo

idw-News App:

AppStore

Google Play Store



Instanz:
Teilen: 
05.02.2026 10:03

Artificial Intelligence Accelerates Access to Insect Collections

Dr. Gesine Steiner Pressestelle
Museum für Naturkunde - Leibniz-Institut für Evolutions- und Biodiversitätsforschung

    Researchers at the Museum für Naturkunde Berlin, together with data scientists, have developed a new method to largely automate the extraction of label information from digitized insect specimens. The pipeline, named ELIE, uses artificial intelligence to reliably detect and process printed labels. This significantly reduces the time-consuming manual transcription work and represents an important advance for the digitization of natural history collections worldwide.

    With more than one million described species, insects represent the most diverse group of living organisms on Earth. Natural history collections worldwide house around 500 million insect specimens collected over the past three centuries. Each specimen carries labels containing essential information such as collection locality, date, and collector. These data form a crucial foundation for research in taxonomy, evolutionary biology, and ecology.
    Despite the availability of high-throughput digitization workflows for collection objects, the transcription of label information is still largely performed manually. Researchers at the Museum für Naturkunde Berlin, working closely with experts in digitization and data science, have now developed a new pipeline that substantially simplifies and accelerates this process.

    The pipeline, ELIE (“Entomological Label Information Extraction”), automates several steps of label processing. Using image analysis and machine learning techniques, ELIE detects individual labels in digital images, aligns them, and classifies them as either printed or handwritten. Printed labels are automatically processed using optical character recognition, while handwritten information is separated for targeted manual transcription. In addition, the system groups identical or highly similar labels, ensuring that recurring information only needs to be reviewed once.

    “With ELIE, we address one of the major bottlenecks in the digitization of entomological collections,” says Margot Belot, Data manager at the Museum für Naturkunde Berlin. “Automating the transcription of printed labels significantly relieves researchers and curators and allows us to make our collections available for scientific use more quickly and systematically.”

    The pipeline was tested, among other datasets, on 26,000 of the label images from the 650,000 insect specimens digitized at the MfN between 2022 and 2023 using a high-speed conveyor-based imaging system developed by the company Picturae. The results show that, depending on the degree of label redundancy, information from up to nearly 90 percent of printed labels can be extracted automatically. Further tests with datasets from the Smithsonian National Museum of Natural History in Washington, D.C., and the Museum of Comparative Zoology at Harvard University demonstrate that ELIE can be reliably applied to previously unseen collections.

    The results have been published in the journal Methods in Ecology and Evolution. The researchers see ELIE as an important building block for the future digitization of natural history collections and as a contribution to making these unique archives of biodiversity more accessible for research.


    Originalpublikation:

    https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210x.70235


    Bilder

    Merkmale dieser Pressemitteilung:
    Journalisten
    Biologie, Informationstechnik
    überregional
    Forschungsergebnisse, Wissenschaftliche Publikationen
    Englisch


     

    Hilfe

    Die Suche / Erweiterte Suche im idw-Archiv
    Verknüpfungen

    Sie können Suchbegriffe mit und, oder und / oder nicht verknüpfen, z. B. Philo nicht logie.

    Klammern

    Verknüpfungen können Sie mit Klammern voneinander trennen, z. B. (Philo nicht logie) oder (Psycho und logie).

    Wortgruppen

    Zusammenhängende Worte werden als Wortgruppe gesucht, wenn Sie sie in Anführungsstriche setzen, z. B. „Bundesrepublik Deutschland“.

    Auswahlkriterien

    Die Erweiterte Suche können Sie auch nutzen, ohne Suchbegriffe einzugeben. Sie orientiert sich dann an den Kriterien, die Sie ausgewählt haben (z. B. nach dem Land oder dem Sachgebiet).

    Haben Sie in einer Kategorie kein Kriterium ausgewählt, wird die gesamte Kategorie durchsucht (z.B. alle Sachgebiete oder alle Länder).