idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Science Video Project
idw-Abo

idw-News App:

AppStore

Google Play Store



Instanz:
Teilen: 
16.05.2024 14:15

8.8 million GPU hours for multilingual LLMs: Breakthrough for generative AI research in Germany and Europe

Katrin Berkler Presse und Öffentlichkeitsarbeit
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

    The Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS and the NLU group of AI Sweden have jointly received computational capacity on the new high-performance computer MareNostrum 5 at the Barcelona Supercomputing Center. This is one of the largest contingents granted by the The European High Performance Computing Joint Undertaking (EuroHPC JU) for the development of European large language models (LLMs) on the EuroHPC infrastructure. From the end of May 2024, the partners will start computing the first multilingual models. The “EuroLingua-GPT” project will run for one year. This means that large European multilingual open source models are now in sight.

    The contingent approved via a EuroHPC “Extreme Scale Access” comprises 8.8 million GPU hours on H100 chips and has been available since May. “These computing capacities are a milestone for Germany and Europe. The models trained with it will massively accelerate the use of generative AI in companies and give both business and science a boost – GenAI 'made in Europe' is thus becoming a reality,” says Dr. Joachim Köhler, head of the NetMedia department at Fraunhofer IAIS. With the new computing capacities, small models in the range of 7 to 34 billion parameters and large models with up to 180 billion parameters can be trained from scratch.

    One model family, all European languages – Fraunhofer IAIS and AI Sweden combine their expertise

    The new EuroLingua models are based on a training dataset consisting of 45 European languages, dialects and codes, including the 24 official European languages. This gives a significant weight to European languages and values – multilingual large language models are still rare. Training will start at the end of May 2024 and the first joint models are expected to be published in the coming months.

    Project leader Dr. Nicolas Flores-Herr, team leader Conversational AI at Fraunhofer IAIS says: “The goal of our collaboration with AI Sweden is to train a family of large language models from scratch that will be published open source.” Magnus Sahlgren, head of Research NLU at AI Sweden adds: “Both the public and private sectors in the EU are asking for open, powerful language models trained for European languages. This is one way to meet that need.”

    The models developed on the EuroHPC infrastructure are intended on the one hand to serve as generalist basic models to support research and science, and on the other hand – for example in joint transfer projects – to provide specialized models for specific sectors or areas of application for productive use in companies or public administration.

    To achieve this, the two organizations are pooling their expertise: Fraunhofer IAIS andAI Sweden's NLU group are two of the leading LLM labs in Europe with proven expertise and years of experience in developing LLMs. For example, Fraunhofer is leading the OpenGPT-X consortium project funded by the Federal Ministry for Economic Affairs and Climate Protection (BMWK), in which large European, multilingual open source models are also being developed. The NLU group at AI Sweden has developed the GPT-SW3 LLM for the Scandinavian languages. The two teams are also working together on other open source community projects. EuroLingua-GPT is also one of three major ongoing EU projects on language models in which Fraunhofer IAIS and AI Sweden collaborate. The other two are TrustLLM and Deploy AI.


    Weitere Informationen:

    http://www.iais.fraunhofer.de Website Fraunhofer IAIS
    http://www.ai.se/en Website AI Sweden
    http://www.bsc.es/ Website Barcelona Supercomputing Center


    Bilder

    8.8 million hours of computational capacity for Fraunhofer IAIS and AI Sweden on the new high-performance computer MareNostrum 5 f at the Barcelona Supercomputing Center.
    8.8 million hours of computational capacity for Fraunhofer IAIS and AI Sweden on the new high-perfor ...
    Barcelona Supercomputing Center
    By courtesy of Barcelona Supercomputing Center - www.bsc.es

    The new EuroLingua models are based on a training dataset consisting of 45 European languages, dialects and codes, including the 24 official European languages.
    The new EuroLingua models are based on a training dataset consisting of 45 European languages, diale ...
    Fraunhofer IAIS
    Fraunhofer IAIS


    Anhang
    attachment icon Press Release Fraunhofer IAIS EuroLingua

    Merkmale dieser Pressemitteilung:
    Journalisten, Wirtschaftsvertreter, Wissenschaftler
    Elektrotechnik, Informationstechnik
    überregional
    Forschungsprojekte, Kooperationen
    Englisch


     

    8.8 million hours of computational capacity for Fraunhofer IAIS and AI Sweden on the new high-performance computer MareNostrum 5 f at the Barcelona Supercomputing Center.


    Zum Download

    x

    The new EuroLingua models are based on a training dataset consisting of 45 European languages, dialects and codes, including the 24 official European languages.


    Zum Download

    x

    Hilfe

    Die Suche / Erweiterte Suche im idw-Archiv
    Verknüpfungen

    Sie können Suchbegriffe mit und, oder und / oder nicht verknüpfen, z. B. Philo nicht logie.

    Klammern

    Verknüpfungen können Sie mit Klammern voneinander trennen, z. B. (Philo nicht logie) oder (Psycho und logie).

    Wortgruppen

    Zusammenhängende Worte werden als Wortgruppe gesucht, wenn Sie sie in Anführungsstriche setzen, z. B. „Bundesrepublik Deutschland“.

    Auswahlkriterien

    Die Erweiterte Suche können Sie auch nutzen, ohne Suchbegriffe einzugeben. Sie orientiert sich dann an den Kriterien, die Sie ausgewählt haben (z. B. nach dem Land oder dem Sachgebiet).

    Haben Sie in einer Kategorie kein Kriterium ausgewählt, wird die gesamte Kategorie durchsucht (z.B. alle Sachgebiete oder alle Länder).