Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Presentations- AI and Repositories
Time:
Tuesday, 17/June/2025:
13:30 - 15:00

Location: Griffin Auditorium


Show help for 'Increase or decrease the abstract text size'
Presentations

A Data Curation, Interrogation, and Access System for the Texas Robotics DataVerse

Xingru Zhou, Zhiyun Deng, Yao-Cheng Chan, Sadanand Modak, Maria Esteva

University of Texas at Austin, United States of America

Research in robot autonomy and human-robot interaction involves multidisciplinary perspectives including engineering, computer science, and social sciences. Datasets derived from robotics studies are large, multimodal, and uniquely structured. Open repositories are increasingly publishing robotics datasets. However, the lack of specific metadata standards hinders their understanding and usability. Additionally, current repository infrastructure does not provide support to interrogate and compare multiple datasets. This work introduces a system that leverages curated metadata from robotics datasets published in the DataVerse-based Texas Data Repository, to enable context-aware access through natural language interaction. A robotics-specific data model was implemented as a knowledge graph within the Texas Advanced Computing Center's open infrastructure. Descriptive and structural metadata from curated datasets in the Texas Data Repository are automatically harvested, mapped to the data model, and integrated into the knowledge graph. The datasets' metadata, selected data files, the knowledge graph schema, and related publications are used to train a ChatGPT-based chatbot, enabling users to query and retrieve data from the repository via natural language. The system’s design, implementation, and evaluation are demonstrated, showcasing its potential to enhance open dataset's interoperability and accessibility. Through continuing research, the system can be applied to datasets from different domains in open repositories.



Can this robot query my Linked Data Store? Exploring retrieval-augmented models for Repository Search & Discovery

Kirsta Stapelfeldt, Bennett Steinburg, Natkeeran Ledchumykanthan, Kyle Huynh

University of Toronto

The University of Toronto Scarborough Library's Digital Scholarship Unit has been developing multiformat and multilingual digital collections for over ten years, with a focus on post-custodial and non-extractive models. This presentation explores the integration of linked data and AI-powered chat-based search, using one specific project as a use case. The Dragomans project, which explores the role of diplomatic interpreter-translators in the Ottoman Empire, provides a multilingual dataset with linked data that allows us to assess the effectiveness of various retrieval-augmented generation methods when implementing chat-based search. This research offers insights that are applicable across various knowledge domains.

The limitations of generative artificial intelligence in handling structured data, complex reasoning, and contextual understanding are well described. We compare various retrieval-augmented models to direct SPARQL/CIPHER queries and compare the results. This allows us to evaluate these approaches and comment on their effectiveness as well as resource implications. Our findings uniquely illuminate the landscape of generative AI, providing valuable guidance for future repository infrastructure development and contributing to the broader discourse on the integration of AI and linked data in digital repositories.



Repository of the Future: A Hybrid Approach of Human Expertise and AI-Driven Data Enrichment

Omorodion Okuonghae

University of Wisconsin-Milwaukee

The continuous advancement and complexities of digital repositories necessitates a forward-thinking approach that balances scalability and contextual accuracy. This proposal introduces the repository of the future that combines a fluid approach of human expertise and artificial intelligence (AI) innovation to produce smarter and enriched data for knowledge advancement. By leveraging the power of AI-enabled semantic web technologies like Wikibase, librarians could potentially change the way repositories are designed, used and interacted with. This combined approach could see humans and AI expertly execute different tasks for the overall good of the repository and mankind. For instance, while artificial intelligence could expertly handle tasks such as metadata generation, data cleaning, and resource recommendation; human oversight could ensure ethical and data integrity, cultural sensitivity, and trustworthiness. The presentation will detail a pilot Wikibase project in Glorious Vision University Library, where this hybrid approach has enhanced the library’s repository functionality and accessibility. Attendees will therefore gain insights into practical workflows, opportunities and challenges in adopting such a model to meet the evolving needs of the global community.



Integrating Machine Actionable Data Management and Sharing Plans (maDMSP) into Campus-Based Open Research and Repository Workflows: A Case Study from the University of Colorado Boulder

Aditya N. Ranganath, Andrew M. Johnson

University of Colorado Boulder, United States of America

Data Management and Sharing Plans (DMSPs) are now a well-established part of the modern scholarly ecosystem, as evidenced by the fact that they are a required component of grant applications at major funding agencies. Despite their widespread use, DMSPs are currently limited by the fact that they are generally intended solely for human audiences, and are rarely machine readable or actionable. As a result, while DMSPs communicate valuable data management and project information to human stakeholders, they are effectively “black boxes” with respect to the various digital systems embedded in the infrastructure of Open Science, including digital data repositories. This presentation explores an effort to better-understand, develop, and implement “next-generation” DMSPs, known as Machine-Actionable Data Management and Sharing Plans (maDMSPs), at the University of Colorado Boulder, as part of a multi-institution grant led by the California Digital Library (CDL) and the Association of Research Libraries (ARL) and funded by the Institute of Museum and Library Services (IMLS). We discuss local use cases for maDMSPs, our project activities, and the challenges we encountered in integrating maDMSPs into campus workflows. We also explore the implications of maDMSPs for the University of Colorado’s institutional repository, and open repositories more generally.



We Need to Chat: A presentation about real-world AI use cases

David Schober1, Carolyn Caizzi2, Elizabeth Roke3

1Northwestern University Libraries, United States of America; 2Harvard University; 3Emory University

Recent advances in artificial intelligence and large language models (LLMs) have created new opportunities for enhancing digital repositories and discovery. This panel presents three implementations that have moved beyond theoretical exploration to AI solutions that solve real-world problems. Northwestern University Libraries developed a discovery interface using Retrieval Augmented Generation (RAG) that enables natural language querying of IIIF-based digital collections, leading to an IMLS-funded open-source solution. Emory University Libraries implemented a hybrid intelligence approach for metadata enhancement, leveraging AWS infrastructure and LLMs to improve description while maintaining human oversight and addressing historical biases. Harvard Library's Collections Explorer project employs semantic search and generative AI to revolutionize special collections discovery, building on insights from their "Talk with HOLLIS" pilot and collaboration with Mozilla.ai. Through these case studies, the panel will examine practical implementation strategies, architectural decisions, and lessons learned while deploying AI in repository environments. Discussion will address how these technologies can meet evolving user expectations while maintaining institutional values and responsibilities. The session will provide attendees with concrete examples of successfully operationalizing AI tools for discovery and metadata generation in library contexts.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: Open Repositories 2025
Conference Software: ConfTool Pro 2.6.153+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany