idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Science Video Project
idw-Abo

idw-News App:

AppStore

Google Play Store



Instance:
Share on: 
10/22/2024 12:00

NBER Study Confirms Strong Performance of PaECTER Model for Patent Analysis

Dr. Myriam Rion Pressestelle
Max-Planck-Institut für Innovation und Wettbewerb

    PaECTER is a deep learning semantic similarity model developed at the Max Planck Institute for Innovation and Competition. The goal is to identify similar patents and publications based on their text content. Semantic similarity search is especially relevant in the context of prior art search for both inventors and patent examiners. Most tools are not scalable, use outdated methods, or are domain-specific. Often they are not open source. PaECTER out-performs all openly available models in the patent domain and performs well in the scientific domain. This has now been confirmed by a new study published by the National Bureau of Economic Research.

    A recent study published by the National Bureau of Economic Research (NBER) has confirmed the strong performance of PaECTER, a patent analysis model developed by a team of researchers at the Max Planck Institute for Innovation and Competition. The model came out on top in a comparison with other models in tasks critical to patent examination and innovation research.

    Developed by Mainak Ghosh, Sebastian Erhardt, Michael E. Rose, Erik Buunk, and Dietmar Harhoff, PaECTER (Patent-Level Representation Learning Using Citation-Informed Transformers) uses advanced transformer-based machine learning techniques fine-tuned with patent citation data. The model is specifically designed to address the complex challenges of patent text analysis and provides significant improvements in the identification and categorization of similar patents, making it highly valuable for both patent examiners and innovation researchers.

    The new NBER working paper “Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection” rigorously compares PaECTER with other Natural Language Processing (NLP) models. The authors Ina Ganguli (University of Massachusetts Amherst), Jeffrey Lin (Federal Reserve Bank of Philadelphia), Vitaly Meursault (Federal Reserve Bank of Philadelphia), and Nicholas Reynolds (University of Essex) assessed the models’ performances in patent interference tasks, where multiple inventors claim similar inventions.

    The study concluded that PaECTER significantly reduces false positives and improves efficiency compared to traditional models like TF-IDF (Term Frequency – Inverse Document Frequency). The study also highlighted PaECTER’s capabilities when compared with other modern models such as GTE and S-BERT (Generalized Text Embedding and Sentence-BERT as methods for representing texts in the form of numerical vectors that capture semantic information about words or entire sentences). While PaECTER performed exceptionally well in expert-driven tasks like interference identification, it also held its own in broader patent classification tasks, further reinforcing its versatility.

    “We are pleased that PaECTER’s performance has been validated by the NBER study, which shows its strengths in patent similarity analysis and confirms its role as a reliable tool for those working in the field of innovation and intellectual property,” says Mainak Ghosh, one of PaECTER’s developers. “This independent validation further strengthens its relevance in the field of patent examination.”

    The PaECTER model is available for use on the Hugging Face platform, making it accessible to researchers, policymakers, and patent professionals worldwide. Its robust performance, as demonstrated by the NBER study, underscores its value in improving the way patent data is processed, contributing to more accurate and efficient analysis of patent innovations over time. As of today, PaECTER has been downloaded more than 1.4 million times.

    ABOUT THE MAX PLANCK INSTITUTE FOR INNOVATION AND COMPETITION

    The Max Planck Institute for Innovation and Competition is committed to fundamental legal and economic research on processes of innovation and competition and their regulation. Our research focuses on the incentives, determinants and implications of innovation. With an outstanding international team of scholars and excellent scientific and administrative infrastructure including our renowned library, we host academics from all over the world and actively promote young researchers. We inform and guide legal and economic discourse on an impartial basis. As an independent research institution, we provide evidence-based research results to academia, policymakers, the private sector as well as the general public.

    To the Max Planck Institute for Innovation and Competition: https://www.ip.mpg.de/en/


    Contact for scientific information:

    Sebastian Erhardt, M.Sc.
    Research Fellow
    https://www.ip.mpg.de/en/persons/erhardt-sebastian.html


    Original publication:

    Ghosh, Mainak; Erhardt, Sebastian; Rose, Michael; Buunk, Erik; Harhoff, Dietmar (2024). PaECTER: Patent-Level Representation Learning Using Citation-Informed Transformers, arXiv preprint 2402.19411. Verfügbar unter https://arxiv.org/abs/2402.19411

    PaECTER on Hugging Face: https://huggingface.co/mpi-inno-comp/paecter

    Ganguli, Ina; Lin, Jeffery; Meursault, Vitaly; Reynolds, Nicholas F. (2024). Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection (No. w32934). National Bureau of Economic Research. Verfügbar unter https://www.nber.org/papers/w32934


    Images

    Criteria of this press release:
    Business and commerce, Journalists, Scientists and scholars
    Economics / business administration, Law, Politics
    transregional, national
    Research results, Transfer of Science or Research
    English


     

    Help

    Search / advanced search of the idw archives
    Combination of search terms

    You can combine search terms with and, or and/or not, e.g. Philo not logy.

    Brackets

    You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).

    Phrases

    Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.

    Selection criteria

    You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).

    If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).