Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Presentations- Software: repositories, metadata and citations
Time:
Tuesday, 17/June/2025:
11:00 - 12:30
Location:Griffin Auditorium
Presentations
Exploring global academic repositories for software.
Domhnall Carlin
Queen's University Belfast, United Kingdom
This present study investigates the occurrence of research software as part of academic outputs within international institutional repositories (IRs). Previous work [1] analysed 182 academic IRs from 157 UK universities and found that there are a limited number of records for research software in these repositories. Additionally, many IRs are unable to list software as independent research outputs due to the constraints of the underlying Research Information System (RIS) platforms. In the present study, data from OpenDOAR—a directory of global Open Access Repositories—was utilized to conduct similar analyses on international IRs, marking what is believed to be the first census of its kind. A total of 4,970 repositories across 125 countries were reviewed for the inclusion of software, along with associated metadata that could indicate relevant factors. The findings suggest that there is significant potential for making straightforward technical enhancements to RIS platforms to allow for the recognition and recording of software as distinct research outputs, including linking IRs with more development-friendly repositories. We explore the implications of these results, particularly concerning the evident lack of acknowledgment of software as a discrete output within the research process. Lastly, we examine the dedicated software underpinning repositories for academia and their usage.
A Repository for Preserving and Managing Running Applications
Raman Ganguly
University of Vienna, Austria
Software preservation poses a significant challenge in modern data management since data often relies on specific software for interpretation and utilization. Unlike relatively stable data, software operates within an ever-evolving technological landscape that includes hardware, operating systems, and libraries, making long-term preservation particularly difficult. Current software development cycles typically last between 3 to 5 years, which can render tools obsolete and jeopardize the accessibility and reproducibility of research that depends on them. Even preserving source code demands considerable effort to recreate functional environments, including necessary dependencies and configurations.
To address these challenges, we propose a repository for managing and preserving running applications using containerization technologies such as Docker and Kubernetes. This approach encapsulates software with its dependencies, creating portable and consistent environments that ensure long-term usability. A proof of concept developed at the University of Vienna demonstrates the feasibility of this method, enabling applications to run securely in isolated "sandboxes" while maintaining their operational context.
Enhancing Metadata Workflows and Long-Term Preservation of Research Outputs: Adopting FAIR Principles with Dagstuhl Publishing
Saadet Bozaci, Michael Didas, Michael Wagner
Schloss Dagstuhl Leibniz-Zentrum für Informatik GmbH, Germany
In its final report, the EOSC working group on Scholarly infrastructures for Research Software (SIRS) identifies archiving, referencing, describing, and crediting research software as critical pillars for open science and long-term sustainability. Based on these recommendations, Dagstuhl Publishing has enhanced its workflows to improve the publication, preservation, and accessibility of scholarly outputs, particularly supplementary materials such as research software.
The extended submission system (DSUB) now enables authors to submit research software alongside their articles. Long-term preservation of the software is ensured by archiving on Software Heritage. The two tools for this, SWH Archive Client and SWH Deposit Client, are available as open source. Metadata is automatically fetched from platforms such as GitHub or CITATION.cff files and manually curated to ensure completeness. A demo is available at http://faircore4eosc.dagstuhl.de/.
Furthermore, the collected metadata is published in a collection on the DROPS repository which connects articles with their supplementary materials. This submission process follows FAIR principles and maintains minimum metadata standards: https://drops.dagstuhl.de/entities/collection/supplementary-materials.
This presentation will highlight Dagstuhl's solutions, including automated metadata workflows using CodeMeta JSON and manual curation, showcasing a scalable model for software preservation and repository management. We will also explore lessons learned and future directions to advance open science and sustainability.
Promoting Visibility into Collections through Object Analysis
Eric Lopatin
California Digital Library, United States of America
At CDL we work with librarians and archivists from across the ten University of California campuses. More specifically, as the team that maintains and augments our digital preservation repository, Merritt, we assist a variety of depositors with their digital preservation initiatives.
Though most depositors are extensively familiar with the file formats and metadata of their content when it is ingested, preserving collections will, in a chronological sense, stretch beyond any one group of individuals. Therefore, a key challenge we face over time is to provide the means for current and future staff to thoroughly analyze the contents of their collections, at scale.
This presentation will discuss a solution the Merritt team at CDL created to address the need to analyze the metadata, digital object structure and file types in the numerous collections being preserved in the repository. The solution leverages Amazon OpenSearch and its data visualization tools. We’ve found that OpenSearch has allowed us to create a rich map of relationships across elements of object data, and in turn lend depositors the necessary knowledge to make informed preservation decisions related to the metadata and structure of the objects in their collections and regarding file format sustainability.