ELIXIR talk – session: ELIXIR – Data resources.
Abstract
The core mission of ELIXIR is to build a stable and sustainable infrastructure for biological information across Europe. These resources include archival and depositional databases (that contain experimental and computationally derived data that range from DNA to protein sequences) to highly dynamic knowledge bases (that aggregate, process and visualize research data, often adding layers of information through manual curation by highly qualified scientific personnel). ELIXIR aims to ensure that these data resources are available long-term and that they are life-cycle managed such that they continue to form a reliable foundation of life sciences projects. Of the over 1000 European data resources, only a small fraction had institutional support and long-term funding commitments. The fact that the survival of many crucial bioinformatics resources in the mid- and long-term is not guaranteed is threatening the foundations of academic and industrial life sciences activities, increases the risk of losing an immense wealth of biological and medical information and squanders the investments associated with their generation. ELIXIR Nodes define, through their Service Delivery Plans (in the case of EMBL-EBI, the ‘Work Programme’), a set of services and data resources that are offered to the research community, the ELIXIR Services. These resources form the backbone of the life sciences data infrastructure. The ELIXIR Core Data Resources are defined as a set of European data resources that are of fundamental importance to the broad life sciences community and the long term preservation of biological data. They provide complete collections of generic value to life sciences, are considered an authority in their field with respect to one or more characteristics, and are showing high levels of scientific quality and service. Thus, ELIXIR Core Data Resources are of wide applicability and usage. Core data resources tend to be well-known within the life sciences community and also reach through to key stakeholders such as funders and journals. ELIXIR Core Data Resources are well maintained with a professional service delivery plan based on well established life-cycle management processes and well understood dependencies. The ELIXIR Core Data Resources coexist with a broader range of databases with diverse motivations, often specialising in a particular scientific topic. Establishing the portfolio of ELIXIR Core Data Resources and ELIXIR Services is a key priority for ELIXIR and publicly marks the transition towards a cohesive infrastructure. Indicators measuring quality and impact Starting from the definition of “ELIXIR Core Data Resources”, a number of such resources were identified. This initial “seed list” of Core Data Resources was subsequently used to inform the identification of key indicators and further refine the definition. The indicators are grouped in five categories: (1) Scientific focus and quality of science (2) Community served by the resource (3) Quality of service (4) Legal and funding infrastructure, and governance (5) Impact and translational stories (1) Scientific focus and quality of science This includes the inherent scientific quality of the data and of the metadata, and its uniqueness and comprehensiveness. Also included are benchmarking against other resources, and whether the resource is an authority in its field. (2) Community This category reflects the size and the measured demand of the communities that are served by the resource: web statistics, user reach, and international use: In addition, certain resources play a foundational role to derived services and data-driven research. Their data are distributed to many other resources and/or services that rely on their existence. (3) Quality of service Certain service levels and reliability can be quantified with specific technical indicators such as the uptime of the resource, response time in general, availability and periodic application of meaningful and automated tests, availability and response time of user support, related training, and other user services. (4) Legal and funding infrastructure, and governance A viable resource has a suitable legal framework (clear terms of use, licensing, and data security). A resource’s commitment to longevity is shown by its institutional support, by its funding schemes and long(er) term financial stability. Core Data Resources have demonstrated that they can manage transitions through different funding sources (episodes). A strong governance structure includes an international, independent Scientific Advisory Board (SAB), which allows community input and provides a permanent oversight framework. (5) Impact and translational stories Impact evaluation attempts to provide a definitive answer to the question of whether the resource is meeting its objective of fulfilling a specific need of the scientific community. The translational stories relate to the role of the resource in accelerating science and are thus a very important indicator. Practical implementation ELIXIR Core Data Resources form the centre of ELIXIR’s sustainability strategy and science policy actions. The collected key indicators for these bioinformatics resources, and more specifically the impact and translational stories, will be used to make a case towards funders. This information in turn will help them to translate the impact that Core Data Resources make. Key indicators for Core Data Resources, in particular those around user policies and procedures, will be useful as flagships of excellence and best practice to support capacity building within the ELIXIR Community. This may be extended to interoperability best practices on concept naming, identifier resolution, identifier mappings and data identity provision and protection. The ELIXIR Core Data Resources, especially the knowledge bases, can function as “concept authorities” within and beyond ELIXIR, having a clear role in standardizing what the community understanding is of a given biological concept. The key indicators will inform life-cycle management, identifying trends and supporting decision-making around a given resource. This is important not only for the teams managing the resources, but also for the identification of Emerging Services that may evolve into Core Data Resources. As new resources are listed on the ELIXIR node Service Delivery Plans, the indicators and capacity building around the Core Data Resources will support the growth of Emerging Services as they mature. Through the ELIXIR-EXCELERATE Node Capacity Building and Communities of Practice and Training Programme work packages, the framework for life-cycle management will be put into practice, supporting the Emerging Services, and strengthening the ELIXIR infrastructure by creating a stairway to excellence.
Authors
Christine Durinx, SIB Swiss Institute of Bioinformatics, Switzerland
Jo McEntyre, EMBL-EBI, United Kingdom
Ron Appel, SIB Swiss Institute of Bioinformatics, Switzerland
Rolf Apweiler, EMBL-EBI, United Kingdom
Mary Barlow, EMBL-EBI, United Kingdom
Niklas Blomberg, ELIXIR, United Kingdom
Chuck Cook, EMBL-EBI, United Kingdom
Elisabeth Gasteiger, SIB Swiss Institute of Bioinformatics, Switzerland
Vassilios Ioannidis, SIB Swiss Institute of Bioinformatics, Switzerland
Jee-Hyub Kim, EMBL-EBI, United Kingdom
Rodrigo LopezEMBL-EBI, United Kingdom
Nicole Redaschi, SIB Swiss Institute of Bioinformatics, Switzerland
Heinz Stockinger, SIB Swiss Institute of Bioinformatics, Switzerland
Daniel Teixeira, SIB Swiss Institute of Bioinformatics, Switzerland
Alfonso Valencia, CNIO and INB-ISCIII, Spain
