Content-update Signaling and Alerting Protocol (CUSAP)
Craig Van Dyck1, Patrick Hargitt2
1Solutions Spectrum, LLC; 2Atypon
The Content-update Signaling and Alerting Protocol (CUSAP) is a project of the International Association of STM Publishers (STM). CUSAP aims to develop a protocol and common service to actively signal and alert repositories and other stakeholders of updates or amendments to published scholarly content – for example errata, name changes, corrections, retractions, newer versions, or expressions of concern. Such a service is not currently available. The context is that the growth of Open Access is expected to further amplify the proliferation of copies of publications across (potentially many) different platforms – calling for additional measures to steward the use of the correct scholarly literature -- i.e., the Version of Record (VoR) -- and to prevent the use of out of date, amended, or retracted content. The project conducted outreach to potential users, and has developed a metadata package and specifications for an initial demonstrator system.
SDG-Classify: Automating the classification of research outputs into UN SDGs
Suchetha Nambanoor Kunnath, Matteo Cancellieri, Petr Knoth
CORE, The Open University, United Kingdom
This paper presents SDG-Classify, a novel AI model for multi-label classification of research papers based on UN Sustainable Development Goals (SDGs), along with its integration into the CORE Dashboard, helping Higher Education Institutions (HEIs) to better understand how the content held in their repository contributes to SDGs. Using a few-shot, two-stage contrastive learning approach, the method generates contextual embeddings from publication metadata, including titles and abstracts, to train a classification head, leveraging an out-of-domain (OOD) multi-label SDG dataset from news articles. While the two-stage fine-tuned model performs effectively in OOD settings, incorporating additional context through label descriptions significantly enhances the model’s performance in the in-domain evaluations. Additionally, integrating SDG-Classify into the CORE Dashboard streamlines the monitoring of SDG contributions for HEIs and supports research managers in targeted resource allocation and impact-driven decision-making.
Identifying and extracting Data Access Statements from full-text academic articles
Matteo Cancellieri, David Pride, Petr Knoth
Open University, United Kingdom
A Data Access Statement (DAS) is a formal declaration detailing how and where the underlying research data associated with a publication can be accessed. It promotes transparency, reproducibility, and compliance with funder and publisher data-sharing requirements. Funders such as Plan S, the European Union, UKRI, and NIH emphasise the inclusion of DAS in publications, underscoring its growing importance.
While a DAS enhances research by increasing transparency, discoverability, and data quality while clarifying access protocols and elevating datasets as first-class research outputs, the repository community faces challenges in managing and curating DAS as a standard metadata component. Manual DAS curation remains labour-intensive and time-consuming, hindering efficient data-sharing practices.
CORE has co-designed with the repository community a module that uses machine learning to identify and extract DAS from full-text articles. This tool facilitates the automated encoding, curation, and validation of DAS within metadata, reducing manual workload and improving metadata quality. This integration aligns with CORE's objective to enhance repository services by providing enriched metadata and supporting compliance with funder requirements. By streamlining DAS management and expanding metadata frameworks, CORE contributes to a more accessible and interconnected scholarly ecosystem, fostering data discoverability and reuse.
Laying the Groundwork for the Future: Creating Tools to Better Harness Metadata and Data Packages
Peyton Carolynn Tvrdy
National Transportation Library, United States of America
In this repository showdown, I plan to demonstrate various tools I have created to better enhance our metadata and lay the groundwork for our repository’s future capabilities. The presentation will cover the following tools: DOI Parser Version 2.0 for DataCite, DCAT-US Version 1.1 Generator, and CSV to Markdown Bulk README Template Converter. The DataCite DOI parser had directly led to more accurate and complete DOI metadata for all our repository’s DOIs, utilizing the power of persistent identifiers to create robust linked data. The DCAT-US generator has given researchers the tools they need to make these required metadata files for their data packages. Lastly, the README generator allows researchers, librarians, and catalogers to easily create documentation for their datasets. These tools have significantly improved our metadata for our DOIs and DCAT-US files, and they have also greatly increased the speed and accuracy of creating and managing our DOIs, metadata files, and README files. These tools are open to the public to be interpreted and repurposed for other repositories and users. These tools demonstrate that problems and improvements can be tackled one issue at a time, even if repository integration of these tools is deep into the future.
|