Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Developer Track Session 2
Time:
Wednesday, 05/June/2024:
09:00 - 10:30

Session Chair: Kathryn Cassidy, Trinity College Dublin
Location: Drottningporten 2

200

Show help for 'Increase or decrease the abstract text size'
Presentations

Experiential Learning and Technical Debt

Clinton Graham

University of Pittsburgh, United States of America

Experimental student projects provide a low-barrier opportunity for research universities to both support student learning and to pilot novel presentations of institutional research and data. This is particularly advantageous when students of a minority community are enabled to tell the stories of the research data of that minority community. The unresolved risk, however, is the ongoing stewardship of this work, as such an experimental or pilot project represents an inherited technical debt after the students have graduated.

This presentation will describe one such student endeavor as a case study. At the University of Pittsburgh, the University Library System and the School of Computing and Information Science partnered to create a query/visualization tool highlighting a distinctive-collections deposit within the University’s institutional repository of transcriptions of a substantial Chinese village gazetteer collection. The presenter will reflect on successes and challenges of this project and will invite conversation on similar technical management of experimental/pilot student projects which highlight institutional repositories’ research and datasets.



Crate-O - a drop-in linked data metadata editor for RO-Crate (and other) linked data in repositories and beyond

Peter Sefton, Alvin Sebastian, Moises Sacal Bonequi, Rosanna Smith

University of Queensland, Australia

Research Object Crate is a metadata packaging standard which has been widely adopted over the last few years in research contexts and which debuted at Open Repositories with a workshop in 2019.

In this presentation we will show a new tool, Crate-O, that implements the specification, allowing data to be described at multiple scales.

Crate-O is a browser-based tool which can work in a wide variety of environments – as an online metadata editor, for describing local data on a user’s hard disk, and as a way of describing entire collections of data, whether legacy or contemporary, either with an easy-to-use GUI or in bulk via spreadsheets.



Bringing computation and reproducibility to a data repository using Binder

Cheng-Jen Lee, Tyng-Ruey Chuang

Academia Sinica, Taiwan

The Binder is an online service that allows users to create and share executable computing environments from datasets in data repositories such as GitHub, Zenodo, or Dataverse. In this talk, we will share our experiences about how to customize and deploy a Binder service at the depositar, a research data repository built on top of CKAN.



Translating Large Datasets for Reproducible Science in an Open Repository

Jake Alan Rosenberg, Vanessa Gonzalez, Sarah Gray, Maria Esteva

Texas Advanced Computing Center, United States of America

The DesignSafe cyberinfrastructure provides comprehensive support for the entire natural hazards research data life cycle. As a science gateway, it equips researchers with tools to manage large datasets and analyze them using high-performance computing (HPC) resources, then curate and publish the results. The CoreTrustSeal-certified Data Depot Repository (DDR) uniquely addresses HPC and archival needs through a knowledge graph-based data curation architecture. Natural hazards research methods are modeled as categories and metadata, guiding researchers to create consistent and complete datasets with clear technical and administrative provenance. The repository architecture supports reuse within the HPC ecosystem by standardizing the formats of large datasets, enhancing their discoverability. DesignSafe employs NetworkX for constructing and manipulating knowledge graphs, ensuring their preservation through mappings to directory structures in replicated storage, alongside container hierarchies in the Fedora repository software. This developer-focused presentation delves into the architecture and implementation details of DesignSafe's knowledge graph repository. The content will be relevant to developers and repository administrators interested in using open repositories to support high-performance computing and reproducible science.



Using Fedora 6 to architect future-proof and easily maintained repositories

Dustin Slater

University of Texas at Austin, United States of America

At the University of Texas at Austin Libraries we have several Digital Asset Management Systems (DAMS) performing a variety of functions, most of which were in Islandora 7. With the Drupal 7 end of life we pursued rebuilding each of these platforms, which was complicated and labor-intensive. We sought to find another approach to solving this problem, one which would allow us to both modularize the infrastructure to move away from a monolithic implementation and adopt the Oxford Common File Layout (OCFL) as our storage layer to prevent needing future data migrations. Fedora 6 allowed us to rethink how to solve our DAMS challenges by providing OCFL and an API to access the data. With this as our foundation, we embarked on rebuilding our Archive of the Indigenous Languages of Latin America portal using a selection of open-source technologies. This resulted in a trilingual portal that is attractive, flexible, and sustainable. We are excited to have completed this project during the International Decade of Indigenous Languages and look forward to seeing how communities all over the world will continue to use the content to reclaim their languages.



Building a repository of data science and machine learning applications by leveraging on container images and Kubernetes

Arnold Kochari

SciLifeLab, Uppsala University, Uppsala, Sweden

An increasing number of research projects produces machine learning models and data science applications a need arises for long-term hosting of these research outputs. To meet this need, we built a custom repository focused specifically on sharing machine learning models and data science applications (SciLifeLab Serve, https://serve.scilifelab.se); it is currently available to life science researchers affiliated with Swedish research institutions. Our repository is essentially a hosting platform where the submitted models and data science applications become available for inferences over an API endpoint or for interactions through a graphical web interface. We built a dedicated user interface (Django application) on top of a Kubernetes cluster (using Kubernetes API to control deployments). Each submission to our repository is a Docker image or is turned into a Docker image based on user input; this image is then running in the Kubernetes cluster. This approach allows for easy scalability, efficiency of resource utilization, quick deployment and updates.

In this presentation we will demonstrate the service, talk about interesting features, our tech stack as well as some of the challenges and our solutions so far. The code behind the platform that we developed is open source allowing anyone to launch an instance of it.