idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Science Video Project
idw-Abo

idw-News App:

AppStore

Google Play Store



Instance:
Share on: 
05/16/2024 14:15

8.8 million GPU hours for multilingual LLMs: Breakthrough for generative AI research in Germany and Europe

Katrin Berkler Presse und Öffentlichkeitsarbeit
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

    The Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS and the NLU group of AI Sweden have jointly received computational capacity on the new high-performance computer MareNostrum 5 at the Barcelona Supercomputing Center. This is one of the largest contingents granted by the The European High Performance Computing Joint Undertaking (EuroHPC JU) for the development of European large language models (LLMs) on the EuroHPC infrastructure. From the end of May 2024, the partners will start computing the first multilingual models. The “EuroLingua-GPT” project will run for one year. This means that large European multilingual open source models are now in sight.

    The contingent approved via a EuroHPC “Extreme Scale Access” comprises 8.8 million GPU hours on H100 chips and has been available since May. “These computing capacities are a milestone for Germany and Europe. The models trained with it will massively accelerate the use of generative AI in companies and give both business and science a boost – GenAI 'made in Europe' is thus becoming a reality,” says Dr. Joachim Köhler, head of the NetMedia department at Fraunhofer IAIS. With the new computing capacities, small models in the range of 7 to 34 billion parameters and large models with up to 180 billion parameters can be trained from scratch.

    One model family, all European languages – Fraunhofer IAIS and AI Sweden combine their expertise

    The new EuroLingua models are based on a training dataset consisting of 45 European languages, dialects and codes, including the 24 official European languages. This gives a significant weight to European languages and values – multilingual large language models are still rare. Training will start at the end of May 2024 and the first joint models are expected to be published in the coming months.

    Project leader Dr. Nicolas Flores-Herr, team leader Conversational AI at Fraunhofer IAIS says: “The goal of our collaboration with AI Sweden is to train a family of large language models from scratch that will be published open source.” Magnus Sahlgren, head of Research NLU at AI Sweden adds: “Both the public and private sectors in the EU are asking for open, powerful language models trained for European languages. This is one way to meet that need.”

    The models developed on the EuroHPC infrastructure are intended on the one hand to serve as generalist basic models to support research and science, and on the other hand – for example in joint transfer projects – to provide specialized models for specific sectors or areas of application for productive use in companies or public administration.

    To achieve this, the two organizations are pooling their expertise: Fraunhofer IAIS andAI Sweden's NLU group are two of the leading LLM labs in Europe with proven expertise and years of experience in developing LLMs. For example, Fraunhofer is leading the OpenGPT-X consortium project funded by the Federal Ministry for Economic Affairs and Climate Protection (BMWK), in which large European, multilingual open source models are also being developed. The NLU group at AI Sweden has developed the GPT-SW3 LLM for the Scandinavian languages. The two teams are also working together on other open source community projects. EuroLingua-GPT is also one of three major ongoing EU projects on language models in which Fraunhofer IAIS and AI Sweden collaborate. The other two are TrustLLM and Deploy AI.


    More information:

    http://www.iais.fraunhofer.de Website Fraunhofer IAIS
    http://www.ai.se/en Website AI Sweden
    http://www.bsc.es/ Website Barcelona Supercomputing Center


    Images

    8.8 million hours of computational capacity for Fraunhofer IAIS and AI Sweden on the new high-performance computer MareNostrum 5 f at the Barcelona Supercomputing Center.
    8.8 million hours of computational capacity for Fraunhofer IAIS and AI Sweden on the new high-perfor ...
    Barcelona Supercomputing Center
    By courtesy of Barcelona Supercomputing Center - www.bsc.es

    The new EuroLingua models are based on a training dataset consisting of 45 European languages, dialects and codes, including the 24 official European languages.
    The new EuroLingua models are based on a training dataset consisting of 45 European languages, diale ...
    Fraunhofer IAIS
    Fraunhofer IAIS


    Attachment
    attachment icon Press Release Fraunhofer IAIS EuroLingua

    Criteria of this press release:
    Business and commerce, Journalists, Scientists and scholars
    Electrical engineering, Information technology
    transregional, national
    Cooperation agreements, Research projects
    English


     

    8.8 million hours of computational capacity for Fraunhofer IAIS and AI Sweden on the new high-performance computer MareNostrum 5 f at the Barcelona Supercomputing Center.


    For download

    x

    The new EuroLingua models are based on a training dataset consisting of 45 European languages, dialects and codes, including the 24 official European languages.


    For download

    x

    Help

    Search / advanced search of the idw archives
    Combination of search terms

    You can combine search terms with and, or and/or not, e.g. Philo not logy.

    Brackets

    You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).

    Phrases

    Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.

    Selection criteria

    You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).

    If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).