Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Session Chair: Cecilia Granell, Chalmers University of Technology
Location:Drottningporten 3
200
Presentations
Creating a better balance: the need for tools and practices to combat AI harvests and resource flooding in repository environments
Allison Kelly Sherrick, Diego Alberto Pino Navarro
Metropolitan New York Library Council, United States of America
We have entered a new era when the internet is fueled by massive AI and ML applications, which has significant consequences for the field of open repositories. The resources required to provide normal, quality repository interactions alongside unregulated consumption of data and resources by AI-powered bots is increasingly more tenuous and difficult to accommodate, and challenges our well established ideas of what openness means. Our Digital Services Team at METRO has been implementing a multifaceted approach to combating the rise of AI bot harvest waves that has been escalating over the past year. This approach includes multiple behind the scenes DevOps and code based tactics, and requires high machine and human resources to maintain and consistently scale up as AI bots become more sophisticated and well-resourced. We propose the implementation of standardized “no-AI'' or “regulated AI” use licenses and realistic DevOps practices that could be applied in open repository environments across the globe. We cannot expect the commercial industry of AI/ML data mining to regulate itself, and we need to create consensus in our own community for combating this significant challenge.
What are the characteristic community smells influencing the sustainability of open-source repository software communities?
In this presentation, we will summarize some emerging methods used to characterize problems (i.e., “community smells”) that open-source software communities face which will affect their sustainability. We will describe the type of information that can be extracted from community analysis tools that analyze GitHub repositories and present the results of running one such diagnostic tool on several open repositories projects: DSpace, Dataverse, EPrints, Islandora, Samvera, Archivematica, and OJS. The motivating objective for this work is to understand if tools can be used by libraries to better understand the open source communities that they rely on, and ultimately, to use that understanding to help address existing issues or to make strategic decisions about adoption.
Automatic detection of duplicate records in institutional repositories
Matteo Cancellieri, Anton Zhuk, Valerii Budko, Ekaterine Chxaidze, Viktoriia Pavlenko, Petr Knoth
Open University, United Kingdom
The prevalence of multiple copies of articles in repositories presents a significant challenge in maintaining the integrity and clarity of the research graph. Issues such as processing errors and lack of communication between co-authors contribute to the existence of duplicates and near-duplicate records. The CORE Dashboard Versions and Duplicates module was developed to address this issue by providing an innovative tool to identify versions and duplicates within repositories. The system facilitates side-by-side comparison and labelling of versions and exact duplicates for removal.
This presentation will report on the experience and the collective feedback from repository managers and give an update on the efforts to integrate duplicate and near-duplicate matching into the deposit workflow.