Scientific Treatments and Supplementary Indices Revolutionize the Reuse of Research Data

EOSC Symposium 2022


Enrichment of published literature at scale

The automated generation of scientific treatments through analysis of literature has been pioneered by Plazi and SIBiLS in the life sciences and by ENS de Lyon in the social sciences and humanities. This radically improves accessibility of scientific facts – even material published in copyrighted literature – enabling the creation of new open FAIR digital annotation object resources.

Plazi’s TreatmentBank and Biodiversity Literature Repository BLR are now one of the principle suppliers of taxonomic treatments from existing publications to the Global Biodiversity Information Facility GBIF. BLR is a key community of Zenodo, the global catch-all repository for research data. Plazi and SIBiLS infrastructures form important components of the BiCIKL Horizon2020 project – The Biodiversity Community Integrated Knowledge Library.


Distributed Research Data Tape Library

Institutional Guarantee (IG) solutions being developed by the DataFutures’ hasdai partnership with CERN break through cost barriers which prevented operation of sustainable data resources. IG sweeps away the prohibitive costs of retaining local IT departments and proprietary alternatives, which have limited retrieval facilities and difficult-to-forecast egress costs.

The Distributed Research Data Tape Library provides a new community-governed and operated infrastruture based on international standards. The tape library provides automated server components, distributed among hosting insitutions, plus consortium-operated remote management. Use of industry-standard LTO technology enables secure copies of corpora to be stored at multiple locations for decades at low cost, and automatically copied to new media as part of the LTO roadmap.


SIBiLS Supplementary Data Index

Key information in medical literature – particularly that relating to genomic variants is often not reported in publications’ full-text, but only in supplementary material such as imagery and tables.

Annotating and searching of this supplementary data enables significant improvement in the volume of documents retrieved for a variant – reducing by about 70% the number of variants for which no match is found in the scientific literature. Supplementary data now represents a paramount source of information for curating variants of unknown significance.


Swiss Institute of Bioinformatics Literature Services (SIBiLS) provides indices of biological literature, allowing fully customizable search for semantically-enriched content, based on biomedical entities from a growing set of standardized and legacy vocabularies. Its services are used extensively for curation of genes and gene products, by delivering customized literature triage engines to curation teams.

Plazi is a global leader in liberation of data in biodiversity publications as machine-readable open FAIR digital objects. Plazi is the single largest provider of data sets to Zenodo and to the Global Biodiversity Information Facility (GBIF), with over 800K taxonomic treatments and 480K figures from approximately 56 thousand articles liberated into the public domain.

DataFutures GmbH is a not-for-profit company which leads development of annotation and long-term preservation infrastructure in the InvenioRDM consortium. The hasdai partnership of European and U.S. institutions, which operates a network of repositories and tape libraries, is managed by DataFutures and governed under a memorandum with CERN.


Past Events


International Conference: "War, Trade and the Divisive Power of Citizenship"

Event Website Europainstitut Basel


We are excited to co-host the conference War, Trade and The Divisive Power of Citizenship at Europainstitut, University of Basel.

Panels include "Next-Generation Digital Resource for Historians", with Christian Futter, Martin Grandjean, Isabella Loehr, Ina Serif and "Sharing Documents, Securing Preservation" with Donat Agosti, Michael Hildreth, Neil Jeffries, Tom Lamberty, Christiane Sibille and Tim Smith lead by Prof. Peter Cornwell.


download Conference Programme PDF


Digital Edition Website - Divisive Power of Citizenship and Loyalty - the French case


Researchers' Tools for Knowledge Preservation

RDA Event


Download Materials: Bell | Cornwell | Duerr | Hildreth | Holm | Jefferies



RDA IG-PTTP and Frankfurter BuchMesse #20: Preserving and Annotating Publishers' Data

Frankfurter Buchmesse #20


RDA IG-PTTP and Frankfurter BuchMesse #20: Preserving and Annotating Publishers' Data

16th of October 2020 15:00-16:00 CEST

Presenters: Peter Cornwell (ENS-Lyon, University of Westminster, Data Futures), José Gonzalez (CERN), Tom Lamberty (Merve Verlag)

The second seminar of this series, co-organized with the Frankfurt Book Fair, addresses digital preservation solutions for the publishing enterprise. Presenting experience from the 2015-2020 open access project of Merve Verlag—winner of the 2020 German Publishers' Prize—these talks proceed from creation of a homogeneous digital corpus, to adoption of APIs for future-proof internet access and platforms for scholarly research, and long-term preservation. The seminar focusses on delivering the full book content of the publisher catalogue—three talks will present this activity:

i) creation of technology-agnostic digital editions;

ii) open access and support of the research community and

iii) strategies for future-proof accessibility and long-term preservation.

The preservation trajectory also addresses digital capture and accessibility for historic information about the publisher, such as author and rights correspondence, launches, events and archives—although this will be addressed in a subsequent event.

Tom Lamberty, managing director of Merve, will present strategies for creating a publishing data resource—from digitization of out-of-print books, as well as conversion of current publications originated digitally using tailored re-delivery workflows, to produce a digital corpus to support conventional print and distribution, print-on-demand and a digital edition which can supply both open reading access and the research community.

Peter Cornwell, research fellow at ENS-Lyon and director of Data Futures GmbH, addresses development of multi-function online access using the International Image Interoperability Framework (IIIF). Automated production of IIIF services from publishers' digital editions supports not only existing and future electronic reading applications, but also new research platforms generating preservable Web Annotation Data Model collections, which can be output directly to repositories.

José Gonzalez, head of repository technologies at CERN, charts the history as well as current developments surrounding long-term access and preservation technologies for research data in the physical and life sciences. Since the 1960s, increasing data volumes and enormous international research investment has driven continuous preservation efforts focussing on software engineering, and CERN has become a prominent developer and user of reliable data repositories. Its technology now underpins the Zenodo global catch-all repository for research data, and the forthcoming release of InvenioRDM gives new communities such as the publishing sector radical new data distribution and preservation opportunities.

This seminar, which is part of the 20th Frankfurt Book Fair programme, is intended as an introduction to creation and deployment of new data resources from existing publisher data and especially, long term operation and maintenance aspects— to ensure preservation of investment in such activities. Organized jointly with the Preservation Tools, Technologies and Policies (PTTP) Group of the Research Data Alliance, a series of more specific seminars, shaped according to participant feedback is planned in early 2021 as part of RDA's ongoing program. An update on this seminar will be included in the annual RDA Plenary 16 meeting, November 9th-16th. The second seminar of this series, co-organized with the Frankfurt Book Fair, addresses digital preservation solutions for the publishing enterprise. Presenting experience from the 2015-2020 open access project of Merve Verlag—winner of the 2020 German Publishers' Prize—these talks proceed from creation of a homogeneous digital corpus, to adoption of APIs for future-proof internet access and platforms for scholarly research, and long-term preservation. The seminar focusses on delivering the full book content of the publisher catalogue—three talks will present this activity.