Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Presentations: Integrations for Research Data Management
Time:
Wednesday, 05/June/2024:
11:00 - 12:30
Session Chair: Katie Mika, Harvard University
Location:Drottningporten 1
200
Presentations
Making Software FAIR: A machine-assisted workflow for the research software lifecycle
Petr Knoth1, Laurent Romary2, Patrice Lopez3, Roberto Di Cosmo2, Pavel Smrz4, Tomasz Umerle5, Melissa Harrison6, Alain Monteil2, Matteo Cancellieri1, David Pride1
1CORE, The Open University, United Kingdom; 2Inria; 3Science Miner; 4Brno University of Technology; 5Polish Academy of Sciences; 6European Institute of Bioinformatics
A key issue hindering discoverability, attribution and reusability of open research software is that its existence often remains hidden within the manuscript of research papers. For these resources to become first-class bibliographic records, they first need to be identified and subsequently registered with persistent identifiers (PIDs) to be made FAIR (Findable, Accessible, Interoperable and Reusable). To this day, much open research software fails to meet FAIR principles and software resources are mostly not explicitly linked from the manuscripts that introduced them or used them.
SoFAIR is a 2-year international project (2024-2025) which proposes a solution to the above problem realised over the content available through the global network of open repositories. SoFAIR will extend the capabilities of widely used open scholarly infrastructures (CORE, Software Heritage, HAL) and tools (GROBID) operated by the consortium partners, delivering and deploying an effective solution for the management of the research software lifecycle, including: 1) ML-assisted identification of research software assets from within the manuscripts of scholarly papers, 2) validation of the identified assets by authors, 3) registration of software assets with PIDs and their archival.
Real-World Benchmarks for FAIR Data Repositories: Meeting the Needs for Modern Open Data
Arran Griffith1, Maria Esteva2, Dan Field1
1Fedora; 2Texas Advanced Computing Center, University of Texas at Austin
Data repositories are fundamental infrastructure in the open science ecosystem, however traditional repository systems now face the challenge of keeping pace with the ever-growing and exponentially increasing scale of modern research data production. Currently, there is limited understanding of how an implementation involving Fedora 6.x in a High Performance Computing (HPC) environment may influence data scalability and functional efficiency of the repository.
This presentation will provide an in-depth look on on-going collaboration between the Fedora program team and data intensive computing and cloud developers at Texas Advanced Computing Center (TACC), to address the performance and scalability limits of Fedora 6.x in a high-performance computing (HPC) environment. Results of this collaboration will provide both the Fedora users and the repository community at large, with a better understanding of the scalability of a repository environment and how to assess it systematically. These crucial performance metrics will allow data repository technical, curatorial and administrative staff to understand how to optimize their infrastructure to meet the demand for management and access of large open data.
Resolving Linked Data: Are we all doing the same?
Mateusz Żółtak, Martina Trognitz
Austrian Academy of Sciences, Austria
Linked Data is a standard widely used to help in providing FAIR data and metadata. We present a more in detail analysis on what the requirements for Linked Data are and how they are technically implemented in various repositories and authority services. The comparison focusing on the usability of the responses provided by the services resolving a requested URI shows notable differences. The results call for a formulation of an updated set of Linked Data principles to enhance (meta)data findability, accessibility and interoperability.
Five ways RO-Crate data packages are important for repositories
Peter Sefton1, Stian Soiland-Reyes2
1University of Queensland, Australia; 2The University of Manchester, UK
Research Object Crate is a linked data metadata packaging standard which has been widely adopted in research contexts. In this presentation we will briefly explain what RO-Crate is, how it is being adopted worldwide, then go on to list ways that RO-Crate is growing in importance in the repository world:
- Uploading of complex multi-file objects means RO-Crate is compatible with any general purpose repository that can accept a zip file (with some coding, repository services can do more with RO-Crates)
- Download for well-described data objects complete with metadata from a repository rather than just a zip or file with no metadata
- Using RO-Crate metadata reduces the amount of customisation that is required in repository software, as ALL the metadata is described using the same simple, self-documenting linked-data structures, so generic display templates
- Sufficiently well-described RO-Crates can be used to make data FAIR compliant, aiding in Findability, Accessibility, Interoperability and Reusability thanks to standardised metadata and mature tooling
- And if you’re looking for a sustainable repository solution, there are tools which can run a repository from a set of static files on a storage service, in line with the ideas put forward by Suleman in the closing keynote for OR2023