A Large-Scale Reference Matching for Records in Japanese Institutional Repositories using Crossref REST API
Chifumi Nishioka, Jun-ichi Onami, Kazutsuna Yamaji
National Institute of Informatics, Japan
Persistent identifiers such as DOIs are essential for scholarly records for e.g., the discovery of related resources and citations. Therefore, repository managers and scholars are expected to input a DOI when they submit a manuscript (e.g., author’s accepted manuscript (AAM)) to a repository. However, sometimes DOIs are missing in the records of repositories. In this work, we perform reference matching for journal articles or conference papers in CiNii Research to identify their DOIs. CiNii Research is the Japanese national discovery platform for scholarly information and contains a lot of records missing a DOI originating from institutional repositories. We conducted a reference matching for 110,477 records using the Crossref API. As a result, we observed that the Crossref REST API returns a truly matching record more than 95% of the time when the score exceeds 120. The Crossref records that scored high but were determined not to be a match include e.g., authors’ responses during peer review. If the Crossref REST API returns multiple records with high scores, we may avoid mismatch by referring to the type of a record (e.g., review) and other non-bibliographic information. This work helps repository managers who attempt to perform reference matching.
URNs as persistent identifiers for repositories: resolution services and user networks
Jyrki Ilva, Emma Pietarila, Ulriika Vihervalli
National Library of Finland, Finland
This presentation explores the use of Uniform Resource Names (URNs), specifically URN:NBNs, as persistent identifiers (PID) in the repository context. URNs are a standards-based identifier system, which has been around since the 1990s. The National Library of Finland (NLF) has been providing a national URN resolution service at urn.fi since 2007. The ease of implementation and cost-free nature of URNs has led to the adoption of URN-based persistent addresses at all Finnish institutional repositories.
URN:NBN identifiers are currently in active use in 13 European countries, with the national library usually providing services and coordination. However, as there is no organization overseeing the use of URNs on a global level, there has been little international cooperation in this area, and the profile of URN:NBN:s remains relatively low compared to some of the other PID systems, especially DOIs.
As part of the EU-funded FAIRCORE4EOSC project, NLF has started to contact and interview the other national URN service providers. There are plans to foster collaboration by organising a URN:NBN webinar and producing a URN:NBN landscape study in 2024. The Meta Resolver service, which is being developed within the FAIRCORE4EOSC project, may also provide steps towards improved interoperability in PID resolution, including, importantly, URN:NBNs.
Data Repository Integration Strategies with the Aid of Persistent Identifiers in the BrCris Project
Washington Luís Ribeiro de Carvalho Segundo1, Thiago Magela Rodrigues Dias2, Patricia da Silva Neubert3, Fábio Lorensi do Canto3
1Brazilian Institute of Information in Science and Technology (IBICT), Brazil; 2Centro Federal de Educação Tecnológica de Minas Gerais (CEFET-MG), Brazil; 3Universidade Federal de Santa Catarina (UFSC), Brasil
Considering the entire process of curating the data to be collected, integrated and analyzed in the context of the BrCris project, a strategy for generating identifiers is necessary. Such identifiers are important because all collected data is mapped to entities, previously identified, in which it is necessary to identify them in a unique way, taking into account the entire processing process to be carried out, especially the process of disambiguation and deduplication of data. To this end, a computational library, using the Python programming language, was proposed in which it is responsible for generating BrCris Identifiers, created with the aim of pre-disambiguating the data, avoiding duplicate entities in the set to be analyzed. For each set of data to be analyzed, a strategy for generating identifiers was considered. This strategy aims to use as little information as possible that is extracted, but which can generate, with a satisfactory level of confidence, unique identifiers, which will be used in several future stages.
It already knows! A prototype PID-optimised workflow for a repository
Rory McNicholl, Will Fyson, Eleanor Dumbill
CoSector, University of London, United Kingdom
Persistent Identifiers (PIDs) are a key part of scholarly communications infrastructure and ensuring that they can be generated and used to represent all parts of the research process has been a long-term goal of the repository community. In the UK a recent national PID strategy has sought to distil knowledge and best practice in this area and establish a spine of key PID services that will ease the flow of information between research related services and organisations within the UK, while also enhancing access to UK research from around the world. Taking a lead from the UK PID strategy, we explore the it’s potential with a practical application within an institutional repository. Is it possible to realise the benefits of efficiency, automation, data quality and system integration, that the strategy promises?
|