21st Conference on Database Systems for
Business, Technology and Web (BTW 2025)
March 3 - 7, 2025 | Bamberg, Germany
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Session | ||||||
BigDS 3: Workshop on Big (and Small) Data in Science and Humanities 3
| ||||||
Session Abstract | ||||||
We have scheduled 20 minutes for each presentation, including the discussion. | ||||||
Presentations | ||||||
ADISS: Authority Data Integration Search System University of Bamberg, Germany This paper introduces ADISS, a generic search system designed to integrate heterogeneous authority file providers. Authority data is used to unambiguously identify entities such as persons, places, and organizations. As single data providers do not offer both quantity and quality of data, a combined access to multiple datasets is often required to support real-world use cases. In the context of Digital Humanities this combination improves the resolution of ambiguities in data curation processes. Our work is mainly motivated by two projects that require semi-automatic retrieval, as well as user-centered search scenarios for different authority file providers. Instead of using multiple existing endpoints to access the various datasets, we gather the heterogeneous data and make it accessible via integrated query and result models. In this paper, we present our highly configurable search API, which offers a diverse range of search and filtering options. We show that by its generic and highly configurable nature, our system is adaptable and reusable for a diverse set of use cases and conclude the paper with ideas for further steps and improvements.
Historic to FAIR: Leveraging LLMs for Historic Term Identification and Standardization 1Institute für angewandte Informatik (InfAI), Germany; 2Freie Universität Berlin, Germany; 3Chair of Computational Humanities, University of Passau, Germany,; 4Fraunhofer FOKUS, Berlin, Germany As society and scientific research progress, so does the language used to describe concepts, species, and objects. With the amount of historical data available online constantly growing, the need for it to be Findable, Accessible, Interoperable, and Reusable (FAIR) has become increasingly apparent. This study tackles the challenge of identifying historical common species and historical scientific names in a historic biodiversity text. The research further identifies five challenges when working with historical common names: changes in spelling, the creation of new terms, the shift from broad historical common names to more specific modern ones (and vice versa), and the renaming of historical common names. The research investigates the use of a large language model, GPT-4, to aid in the aforementioned entity detection process and to solve the identified challenges. The findings demonstrate that, with a small given context, the large language model can effectively identify the historical common species names and scientific names. In a test dataset, the LLM achieved a 92% success rate in accurately detecting the mentioned historical common names. Furthermore, 98% of the scientific terms were correctly identified. For four out of the five challenges of historical common names, the LLM was able to provide meaningful input. It was demonstrated that the LLM can match the historical common names to their modern-day counterparts, showing an embedded understanding of the evolution of biodiversity terminology. These results emphasize its potential for making the data more findable, accessible, interoperable, and reusable.
Investigating Zero-shot Topic Labelling of Scientific Papers Using LLMs 1Trier University, Germany; 2Schloss Dagstuhl LZI dblp group, Germany In this paper, we focus on the problem of adding content labels of a given vocabulary to scientific publications using LLMs. After a short overview of the current state of the work, we present a first implementation of a zero-shot classification pipeline. This implementation is already realized with a focus on extendibility and customizability, so that it can easily be used for different data sets and use cases in the future. We select a subset of the DBLP Discovery Dataset and execute our pipeline on it. In the end, we discuss the results, suggest a comparison with a second data set, the STTCL journal from the humanities, and present its challenges. Both of the mentioned data sets comply with the FAIR data principles. Finally, we consider our plans for the next steps.
|
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: BTW 2025 Bamberg |
Conference Software: ConfTool Pro 2.6.153+TC © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |