idw – Informationsdienst Wissenschaft

Nachrichten, Termine, Experten

Grafik: idw-Logo
Science Video Project
idw-Abo

idw-News App:

AppStore

Google Play Store



Instance:
Share on: 
03/07/2016 14:56

Big Data for Text: Next-Generation Text Understanding and Analysis

Friederike Meyer zu Tittingdorf Pressestelle der Universität des Saarlandes
Universität des Saarlandes

    News portals and social media are rich information sources, for example for predicting stock market trends. Today, numerous service providers allow for searching large text collections by feeding their search engines with descriptive keywords. Keywords tend to be highly ambiguous, though, and quickly show the limits of current search technologies. Computer scientists from Saarbrücken developed a novel text analysis technology that considerably improves searching very large text collections by means of artificial intelligence.

    Beyond search, this technology also assists authors in researching and even in writing texts by automatically providing background information and suggesting links to relevant web sites. Ambiverse, a spin-off company from the Max Planck Institute for Informatics in Saarbrücken, will be presenting this novel technology during Cebit 2016 in Hannover from 14 to 18 March at Saarland’s research booth.

    Living in the age of business smartphones and enterprise chatrooms, most information in companies is not distributed via spoken words but rather through e-mails, databases, and internal news portals. “According to a survey by the market analyst Gartner, a mere quarter of all companies are using automatic methods to analyze their textual information. By 2021, Gartner predicts 65 per cent will do so. This is because the amount of data inside companies is continuously growing and hence, it becomes more and more costly to have it structured and to search it successfully,” says Johannes Hoffart, a researcher at the Max Planck Institute for Informatics and founder of Ambiverse. His team developed a novel text analysis technology for analyzing huge amounts of text where massive computing power and artificial intelligence (AI) are continuously “thinking along” in the background.

    “For analyzing texts, we rely on extremely large knowledge graphs which are built upon freely available sources such as Wikipedia or large media portals on the web. These graphs can be augmented with domain- or company-specific knowledge, such as product catalogs or customer correspondences,” says Hoffart. By applying complex algorithms, these texts are screened further and analyzed with linguistic tools. “Our software then assigns companies and areas of business to their corresponding categories, which allows us to gather valuable insights on how well one’s own products are positioned in the market in comparison to those of the competitors,” he explains. Particularly challenging hereby is the fact that product or company names are anything but unique and tend to have completely different meanings in different contexts, making them highly ambiguous.

    “Our technology helps to map words and phrases to their correct objects of the real-world, that way resolving ambiguities automatically,” explains the computer scientist. “Paris” for example stands for the city of light and the French capital, but also for a figure from Greek mythology or a millionfold-mentioned party girl with German ancestors – always depending on context. “Efficiently searching huge text collections is only possible if the different meanings of a name or a concept are correctly resolved,” says Hoffart. The smart search engine developed by his team continuously learns and improves over time, and also automatically associates new text entries to matching categories. “These algorithms are hence attractive for companies that analyze online media or social networks to measure the degree of brand awareness for a product or the success of a marketing campaign,” says Hoffart further.

    At Cebit, Ambiverse will further present a smart authoring platform that assists authors in researching and writing texts. Users who enter texts are automatically provided with background information, for example company-internal guidelines and manuals or web links. “Relevant concepts are linked automatically and links for further research are shown” says the computer scientist.

    Visitors to the Ambiverse Cebit booth (hall 6, booth 28) will also have the opportunity to compete with their novel AI technology by playing a question-answering game. Ambiverse is funded by the German Federal Ministry for Economic Affairs through an EXIST Transfer of Research grant.

    Press contact:

    Dr. Johannes Hoffart
    Ambiverse
    Max-Planck-Institut für Informatik
    Tel +49 681 9325-5024
    Fax +49 681 9325-5099
    johannes@ambiverse.com
    www.ambiverse.com


    More information:

    http://www.ambiverse.com


    Images

    Criteria of this press release:
    Business and commerce, Journalists, Scientists and scholars
    Information technology, Language / literature, Media and communication sciences
    transregional, national
    Transfer of Science or Research
    English


     

    Help

    Search / advanced search of the idw archives
    Combination of search terms

    You can combine search terms with and, or and/or not, e.g. Philo not logy.

    Brackets

    You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).

    Phrases

    Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.

    Selection criteria

    You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).

    If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).