The 20th International Conference on
Open Repositories

Chicago, Illinois, USA | June 15-18, 2025

JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at or25-program-chairs@googlegroups.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Developer Track Session 1

Time:

Monday, 16/June/2025:

11:00am - 12:30pm

Session Convener: Ianthe Sutherland, University of Edinburgh

Location: C119&121- Classrooms

Presentations

A year of Hybrid ML/AI cataloging aid in Archipelago Commons: The state, the lessons and probable future(s) explained through a real production implementation

Diego Alberto Pino Navarro, Allison Sherrick

Metropolian New York Library Council, United States of America

During OR2024 we showcased our first release of a public ML/AI Image Similarity and semantic search prototype integrated into Archipelago Commons(1.4.0). A primer in OSS Repository systems. We also gave an introduction to the challenges, concerns and hows of our development, including a not so brief 101 on ML inference.

A year of research later, we want to share the evolved state of the solution and what will be available for 1.5.0, all this by using a specific community prototype as a showcase: Revs Institute.

We will explain through this project how we support field specific human knowledge and responsible decision making to enrich unknowns (media/metadata) through transfer learning from well cataloged and known Digital Objects without giving up control to unattended inference.

We will also demonstrate how datasets for re-training and refining of ML Models are produced and reinjected through the act of assisted cataloging, allowing a cyclic, self feeding ML system. Finally, and most importantly, we will share a closing and honest discussion of the expectations, limitations, and carbon footprint effects of ML on production: the scope of ML/AI from our and our community’s ethical and system’s design goals perspective.

Deploying DataFed for Scientific Data Management: Lessons Learned

Theodore Samuel Beers¹, Joshua Agar¹, Chad Peiper², Jane Greenberg³, Chirayu Patel¹

¹College of Engineering, Drexel University; ²College of Computing & Informatics, Drexel University; ³Metadata Research Center, Drexel University

This Developer Presentation shares our experience using DataFed, a novel scientific data management system, in research projects at Drexel University. We will address three main questions: What is DataFed, and why might Open Repositories participants find it useful? What challenges have we faced in deploying DataFed endpoints, and how did we overcome those? What should researchers know about the technical requirements and process of implementing DataFed in the university setting?

DataFed, developed on an open-source basis at Oak Ridge National Laboratory (ORNL), is a distributed system for managing research data and metadata. It simplifies handling large datasets in alignment with the FAIR principles, providing researchers with scalable storage, a customizable metadata infrastructure, and optional federated sharing capabilities.

Deploying a DataFed repository presents certain challenges—for example, a server with a public static IP address and open ports for ingress, which institutional network administrators may be reluctant to allow. Additionally, DataFed depends on Globus Connect Server for secure data transfer, introducing further complexities.

Despite such obstacles, we have found that the benefits of DataFed more than justify the effort. We are also collaborating with ORNL to streamline the deployment process for future users.

Renovation and enhancement of statistics pages in DSpace 7

Zhongda Zhang¹, Le Yang²

¹University of Oklahoma, United States of America; ²University of Oregon, United States of America

This proposal outlines the renovation and enhancement of statistics pages in DSpace 7 to improve their functionality, usability, and impact. Current limitations, such as text version design lacks information that users are interested in, hinder their effectiveness. The project aims to introduce modernized interfaces, advanced data visualization, real-time analytics, and enhanced filtering options. By addressing these issues, the renovated statistics pages will empower administrators, researchers, and contributors with actionable insights, improve user engagement, and align with institutional goals. The proposed enhancements will make DSpace a more statistically efficient and user-friendly digital repository platform.

Putting your middleware on steroids with DSpace 7+ REST API

Bram Luyten

Atmire, Belgium

For many institutions, highly customized DSpace code has long been necessary to handle complex integration requirements—a practice that often results in technical debt and impedes upgrades to DSpace installations. Since DSpace 7, however, the decoupling of the front end (DSpace Angular) and back end (DSpace REST) has fundamentally changed the way integrations can be managed. By “consuming its own dog food,” the DSpace community now channels all DSpace UI operations through the comprehensive and well-documented DSpace 7 REST API.

In this proposal, we present a real-world implementation that demonstrates how data from a third-party business process can be seamlessly ingested into DSpace based on robust business rules—without any modifications to the core DSpace code. Instead, we leverage simple Bash scripts that interface with the DSpace 7 REST API, showcasing a sustainable approach to complex integrations that avoids the pitfalls of traditional code customization. We will discuss the design, implementation, and impact of this solution, highlighting best practices and lessons learned to help others optimize their own DSpace integrations.

Jetstream2 and Cloud-Based Dev Tools for Data Curation Training

Seth Erickson

University of California, Santa Barbara

Research data curators draw on a wide range of software tools to manage and preserve repository submissions. There is no singular software “stack” for data curation. However, for the purposes of facilitating workshops on data curation practices, a shared environment with commonly used tools and resources is needed. As part of an ongoing project with the Data Curation Network (DCN), the author evaluated the feasibility of using cloud-based developer environments for conducting workshops on specialized data curation topics. This presentation provides an overview and demonstration of an exploratory infrastructure for data curation workshops running on JetStream2, a computing resource available to US-based researchers and educators through the National Science Foundation. (NSF). Coder (an open-source, self-hosted web service) is used to provision cloud-based environments for workshop participants. The infrastructure is defined using Terraform to facilitate configuration and deployment for each workshop; the code is available on GitHub (https://github.com/srerickson/js2-coder).

Mobile View Print View

Contact and Legal Notice · Contact Address:

or25-program-chairs{at}googlegroups dot

com

Privacy Statement · Conference: Open Repositories 2025

The 20th International Conference on Open Repositories

Chicago, Illinois, USA | June 15-18, 2025

Conference Agenda

The 20th International Conference on
Open Repositories