21st Conference on Database Systems for
Business, Technology and Web (BTW 2025)
March 3 - 7, 2025 | Bamberg, Germany
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Session | ||||||||
R5: Research 5: Data Engineering Pipelines
| ||||||||
Presentations | ||||||||
2:40pm - 3:00pm
Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based on gradual information disclosure 1Leipzig University, Germany; 2ScaDS.AI Dresden/Leipzig, Leipzig, Germany Privacy-Preserving Record linkage (PPRL) is an essential component in data integration tasks of sensitive information. The linkage quality determines the usability of combined datasets and (machine learning) applications based on them. We present a novel privacy-preserving protocol that integrates clerical review in PPRL using a multi-layer active learning process. Uncertain match candidates are reviewed on several layers by human and non-human oracles to reduce the amount of disclosed information per record and in total. Predictions are propagated back to update previous layers, resulting in an improved linkage performance for non-reviewed candidates as well. The data owners remain in control of the amount of information they share for each record. Therefore, our approach follows need-to-know and data sovereignty principles. The experimental evaluation on real-world datasets shows considerable linkage quality improvements with limited labeling effort and privacy risks.
3:00pm - 3:20pm
Embracing Change: Incremental Updates of Discovered Event Queries Humboldt Universität zu Berlin, Germany In complex event processing (CEP), queries are evaluated continuously over streams of events to detect situations of interest, thereby facilitating reactive applications. However, users often lack insights into the precise event pattern that characterizes the situation, which renders the definition of the respective queries challenging. Once a database of finite, historic streams, each containing a materialization of the situation of interest, is available, query discovery supports users in the definition of the desired queries. It constructs the queries that match a certain share of the given streams, as determined by a support threshold. Yet, upon changes in the database or changes of the support threshold, existing algorithms need to construct the resulting queries from scratch, neglecting the queries obtained in previous runs. In this paper, we aim to avoid the resulting inefficiencies by techniques for incremental query discovery. We first provide a theoretical analysis of the problem context, before presenting algorithmic solutions to cope with changes in the stream database or the adopted support threshold. Our experiments using real-world data show that our incremental query discovery reduces the runtimes by up to three orders of magnitude compared to a baseline solution.
3:20pm - 3:30pm
Identifying Semantic Components for PBE-based Transformation Discovery BIFOLD & TU Berlin, Germany Complex data transformations involve a combination of syntactic and semantic operations. Recent LLM-based Programming-by-example (PBE) approaches aid in finding sequences of syntactic and semantic operations to satisfy given transformation examples. As testing LLM outputs is expensive, such approaches defer the prompting step after all syntactic operations have been identified. During this process, sequences of tokens that need semantic look-ups are split and their order is lost, leading to lower accuracy. We address this problem by focusing on transformation tasks that are challenging and propose a pre-processing step that impedes destructive splits of such sequences.
|
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: BTW 2025 Bamberg |
Conference Software: ConfTool Pro 2.6.153+TC © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |