Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
BigDS 3: Workshop on Big (and Small) Data in Science and Humanities 3
Time:
Tuesday, 04/Mar/2025:
1:30pm - 3:00pm

Session Chair: Birgitta König-Ries, University of Jena
Location: WE5/04.004

60 Personen

Session Abstract

We have scheduled 20 minutes for each presentation, including the discussion.
The workshop will end with a general discussion on the topic and concluding remarks.


Show help for 'Increase or decrease the abstract text size'
Presentations

ADISS: Authority Data Integration Search System

Leon Fruth, Tobias Gradl, Andreas Henrich

University of Bamberg, Germany

This paper introduces ADISS, a generic search system designed to integrate heterogeneous authority file providers. Authority data is used to unambiguously identify entities such as persons, places, and organizations. As single data providers do not offer both quantity and quality of data, a combined access to multiple datasets is often required to support real-world use cases. In the context of Digital Humanities this combination improves the resolution of ambiguities in data curation processes. Our work is mainly motivated by two projects that require semi-automatic retrieval, as well as user-centered search scenarios for different authority file providers. Instead of using multiple existing endpoints to access the various datasets, we gather the heterogeneous data and make it accessible via integrated query and result models. In this paper, we present our highly configurable search API, which offers a diverse range of search and filtering options. We show that by its generic and highly configurable nature, our system is adaptable and reusable for a diverse set of use cases and conclude the paper with ideas for further steps and improvements.

Fruth-ADISS Authority Data Integration Search System-260_a.pdf


Historic to FAIR: Leveraging LLMs for Historic Term Identification and Standardization

Jan Felix Marten Fillies1,2, Maximilian Teich3, Naouel Karam1,2, Adrian Paschke1,2,4, Malte Rehbein3

1Institute für angewandte Informatik (InfAI), Germany; 2Freie Universität Berlin, Germany; 3Chair of Computational Humanities, University of Passau, Germany,; 4Fraunhofer FOKUS, Berlin, Germany

As society and scientific research progress, so does the language used to describe concepts, species, and objects. With the amount of historical data available online constantly growing, the need for it to be Findable, Accessible, Interoperable, and Reusable (FAIR) has become increasingly apparent. This study tackles the challenge of identifying historical common species and historical scientific names in a historic biodiversity text. The research further identifies five challenges when working with historical common names: changes in spelling, the creation of new terms, the shift from broad historical common names to more specific modern ones (and vice versa), and the renaming of historical common names. The research investigates the use of a large language model, GPT-4, to aid in the aforementioned entity detection process and to solve the identified challenges. The findings demonstrate that, with a small given context, the large language model can effectively identify the historical common species names and scientific names. In a test dataset, the LLM achieved a 92% success rate in accurately detecting the mentioned historical common names. Furthermore, 98% of the scientific terms were correctly identified. For four out of the five challenges of historical common names, the LLM was able to provide meaningful input. It was demonstrated that the LLM can match the historical common names to their modern-day counterparts, showing an embedded understanding of the evolution of biodiversity terminology. These results emphasize its potential for making the data more findable, accessible, interoperable, and reusable.

Fillies-Historic to FAIR-228_a.pdf


Investigating Zero-shot Topic Labelling of Scientific Papers Using LLMs

Jens Bruchertseifer1, Patrick Neises2, Maria Hinzmann1, Ralf Schenkel1, Christof Schöch1

1Trier University, Germany; 2Schloss Dagstuhl LZI dblp group, Germany

In this paper, we focus on the problem of adding content labels of a given vocabulary

to scientific publications using LLMs. After a short overview of the current state of the work, we

present a first implementation of a zero-shot classification pipeline. This implementation is already realized with a focus on extendibility and customizability, so that it can easily be used for different data sets and use cases in the future. We select a subset of the DBLP Discovery Dataset and execute our pipeline on it. In the end, we discuss the results, suggest a comparison with a second data set, the STTCL journal from the humanities, and present its challenges. Both of the mentioned data sets comply with the FAIR data principles. Finally, we consider our plans for the next steps.

Bruchertseifer-Investigating Zero-shot Topic Labelling of Scientific Papers Using LLMs-235_a.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: BTW 2025 Bamberg
Conference Software: ConfTool Pro 2.6.153+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany