Heidelberg, 31 October 2017 – SourceData from EMBO is an award-winning open platform that allows researchers and publishers to share figures and their underlying data in a machine-readable, searchable format, making research papers discoverable based on their data content. As highlighted in today’s paper in Nature Methods (Liechti et al., 2017), SourceData offers a novel method to describe research data and a suite of tools to generate, validate and use this information, providing scientists with an efficient method to find and re-use published results
“In the biological sciences, most of the data produced by researchers is published in the form of figures. Figures are the heart of a scientific paper. However, the search tools used to find published papers are usually limited to keyword-based text searches that exclude figure contents,” SourceData project leader Thomas Lemberger of EMBO explains. This can result in relevant data being missed from search results due to the lack of a consistent method for representing figures in a searchable form.
With SourceData, a machine-readable description of each figure is generated and stored in a structured database. The biological entities represented in the figure, such as genes, proteins or molecules, are linked to standardized taxonomies to avoid naming ambiguity. This means that each occurrence of a certain biological entity in a figure or result set can be quickly found within the SourceData database. SourceData also stores the direction of the relationship between entities: whether they were manipulated or observed, allowing very specific searches based on the experimental design.
Paper co-author Robin Liechti from the SIB Swiss Institute of Bioinformatics (SIB), explains “SourceData links figures to other related figures across papers and journals to build a searchable knowledge graph, which is quality-controlled by expert curators. Readers of scientific articles can use this to find the data they need in a much more efficient way.”
SourceData provides a suite of applications including SmartFigures: enhanced figures containing links to related results and data that can be embedded in online publications, DataSearch: a search engine that finds published figures based on their data content, and MetaFig: a curation interface that offers computer-assisted importing of new figures into the SourceData format
The SourceData platform is currently in active development, with EMBO and SIB engaging with academic publishers to establish an open and effective standard for the discovery and reuse of figures and data.
Liechti, R, Götz, L, George, N, El-Gebali, S, Chasapi A, Crespo, I, Xenarios, I & Lemberger, T (2017). SourceData - a semantic platform for curating and searching figures. Nature Methods. DOI: 10.1038/nmeth.4471
About the SIB Swiss Institute of Bioinformatics
Mobile: + 49 160 9019 3839
Criteria of this press release:
Research projects, Scientific Publications
You can combine search terms with and, or and/or not, e.g. Philo not logy.
You can use brackets to separate combinations from each other, e.g. (Philo not logy) or (Psycho and logy).
Coherent groups of words will be located as complete phrases if you put them into quotation marks, e.g. “Federal Republic of Germany”.
You can also use the advanced search without entering search terms. It will then follow the criteria you have selected (e.g. country or subject area).
If you have not selected any criteria in a given category, the entire category will be searched (e.g. all subject areas or all countries).