Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
R5: Research 5: Data Engineering Pipelines
Time:
Thursday, 06/Mar/2025:
2:40pm - 3:30pm

Session Chair: Thorsten Papenbrock, Philipps-Universität Marburg
Location: WE5/00.022

Lecture Hall 1

Show help for 'Increase or decrease the abstract text size'
Presentations
2:40pm - 3:00pm

Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based on gradual information disclosure

Florens Rohde1,2, Victor Christen1,2, Martin Franke1,2, Erhard Rahm1,2

1Leipzig University, Germany; 2ScaDS.AI Dresden/Leipzig, Leipzig, Germany

Privacy-Preserving Record linkage (PPRL) is an essential component in data integration tasks of sensitive information. The linkage quality determines the usability of combined datasets and (machine learning) applications based on them. We present a novel privacy-preserving protocol that integrates clerical review in PPRL using a multi-layer active learning process. Uncertain match candidates are reviewed on several layers by human and non-human oracles to reduce the amount of disclosed information per record and in total. Predictions are propagated back to update previous layers, resulting in an improved linkage performance for non-reviewed candidates as well. The data owners remain in control of the amount of information they share for each record. Therefore, our approach follows need-to-know and data sovereignty principles.

The experimental evaluation on real-world datasets shows considerable linkage quality improvements with limited labeling effort and privacy risks.

Rohde-Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based-109_b.pdf
Rohde-Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based-109_c.zip


3:00pm - 3:20pm

Embracing Change: Incremental Updates of Discovered Event Queries

Rebecca Sattler, Sarah Kleest-Meißner, Steven Lange, Markus L. Schmid, Nicole Schweikardt, Matthias Weidlich

Humboldt Universität zu Berlin, Germany

In complex event processing (CEP), queries are evaluated continuously over streams of events to detect situations of interest, thereby facilitating reactive applications. However, users often lack insights into the precise event pattern that characterizes the situation, which renders the definition of the respective queries challenging. Once a database of finite, historic streams, each containing a materialization of the situation of interest, is available, query discovery supports users in the definition of the desired queries. It constructs the queries that match a certain share of the given streams, as determined by a support threshold. Yet, upon changes in the database or changes of the support threshold, existing algorithms need to construct the resulting queries from scratch, neglecting the queries obtained in previous runs. In this paper, we aim to avoid the resulting inefficiencies by techniques for incremental query discovery. We first provide a theoretical analysis of the problem context, before presenting algorithmic solutions to cope with changes in the stream database or the adopted support threshold. Our experiments using real-world data show that our incremental query discovery reduces the runtimes by up to three orders of magnitude compared to a baseline solution.

Sattler-Embracing Change-127_b.pdf
Sattler-Embracing Change-127_c.pdf


3:20pm - 3:30pm

Identifying Semantic Components for PBE-based Transformation Discovery

Dakai Men, Binger Chen, Ziawasch Abedjan

BIFOLD & TU Berlin, Germany

Complex data transformations involve a combination of syntactic and semantic operations. Recent LLM-based Programming-by-example (PBE) approaches aid in finding sequences of syntactic and semantic operations to satisfy given transformation examples. As testing LLM outputs is expensive, such approaches defer the prompting step after all syntactic operations have been identified. During this process, sequences of tokens that need semantic look-ups are split and their order is lost, leading to lower accuracy. We address this problem by focusing on transformation tasks that are challenging and propose a pre-processing step that impedes destructive splits of such sequences.

Men-Identifying Semantic Components for PBE-based Transformation Discovery-156_b.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: BTW 2025 Bamberg
Conference Software: ConfTool Pro 2.6.153+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany