Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Student Poster: Poster Reception
Time:
Wednesday, 05/Mar/2025:
6:00pm - 8:00pm

Session Chair: Rainer Gemulla, Universität Mannheim
Location: WE5/00.033

Irmler Music Hall

Show help for 'Increase or decrease the abstract text size'
Presentations

Utilising Large Language Models for Adversarial Attacks in Text-to-SQL: A Perpetrator and Victim Approach

Ariana Sahitaj1, Markus Nilles2, Ralf Schenkel2, Vera Schmitt1

1Technical University Berlin, Germany; 2University of Trier, Germany

This paper investigates the use of Large Language Models (LLMs) for the Text-to-SQL task, both as Perpetrator models for generating adversarial attacks and as Victim models for assessing their robustness. In this study, two state-of-the-art LLMs, Llama3 with 70 billion and Mixtral with 47 billion parameters, were employed as Perpetrators to generate adversarial examples at the character-, word-, and sentence-level. A total of 77,292 adversarial examples were generated from 2,147 data points of the Spider test-set using three additional LLMs as Victims and evaluated thoroughly. These Victim models are based on Llama3 with 8 billion parameters and differ only in the extent of fine-tuning for related benchmark tasks. The results show that attacks at the word-level, particularly through synonym replacements, most significantly impair model performance. Additionally, providing database schemas significantly improves execution accuracy, while fine-tuning does not always enhance robustness against adversarial attacks. This work provides important insights into improving the reliability of Text-to-SQL models in future applications and makes a significant contribution to the further development of these models in research.

Sahitaj-Utilising Large Language Models for Adversarial Attacks-216_b.pdf


Data Model Creation with MetaConfigurator

Felix Neubauer, Jürgen Pleiss, Benjamin Uekermann

University of Stuttgart, Germany

In both research and industry, significant effort is devoted to the creation of standardized data models that ensure data adheres to a specific structure, enabling the development and use of common tools. These models (also called schemas), enable data validation and facilitate collaboration by making data interoperable across various systems. Tools can assist in the creation and maintenance of data models.

One such tool is MetaConfigurator, a schema editor and form generator for JSON schema and for JSON/YAML documents. It offers a unified interface that combines a traditional text editor with a graphical user interface (GUI), supporting advanced schema features such as conditions and constraints. Still, schema editing can be complicated for novices, since MetaConfigurator shows all options of JSON schema, which is very expressive. The following improvements and functionalities have been designed and implemented to further assist the user: 1) A more user-friendly schema editor, distinguishing between an easy and an advanced mode based on a novel meta schema builder approach; 2) A CSV import feature for seamless data transition from Excel to JSON with schema inference; 3) Snapshot sharing for effortless collaboration; 4) Ontology integration for auto-completion of URIs; and 5) A novel graphical diagram-like schema view for visual schema manipulation. These new functionalities are then applied to a real-world use case in chemistry, demonstrating the usability and improved accessibility of MetaConfigurator.

Neubauer-Data Model Creation with MetaConfigurator-253_b.pdf


Designing a FAIR Research Data Management System – Insights from a User Needs Study

Sabine Diemt

FernUniversität in Hagen, Germany

Research data management, particularly in heterogeneous environments, presents a significant challenge. This is especially true for universities, which, due to their broad range of disciplines, must meet a wide variety of requirements. While systems already exist for this purpose, they often focus on specific research areas or fail to prioritize the user. In this paper, a catalog of requirements for a university-wide user-centered RDMS is developed. To this end, a user study was conducted in the form of an online user survey with 269 international scientists from 36 different disciplines. It resulted in the identification of useful functions for an RDMS in line with the FAIR principles.

Diemt-Designing a FAIR Research Data Management System – Insights-251_b.pdf


Entwurf und Implementierung eines Workflow-Editors für die Geodatenverarbeitung

Lutz Kremer

Philipps-Universität Marburg, Germany

Das Startup Geo Engine der Universität Marburg bietet eine leistungsstarke cloud-basierte Plattform zur Integration, Analyse und Visualisierung zeitabhängiger Geodaten. Die Plattform ermöglicht insbesondere die inkrementelle Erstellung explorativer Workflows und Datenpipelines, die entweder über eine webbasierte Benutzeroberfläche oder als JSON-Objekte definiert werden können. Die Datenvisualisierung unterstützt Data Scientists bei der dynamischen Anpassung der Workflows durch eine instantane Anzeige der Ergebnisse in einer Karte. Bisher konnten Änderungen an vorhandenen Workflows nur durch komplexe und fehleranfällige Modifikationen des JSON-Objekts vorgenommen werden. Um die Benutzerfreundlichkeit zu verbessern, wird in dieser Arbeit eine neue grafische Benutzeroberfläche für die Definition und Änderung von Workflows vorgestellt. Diese bietet zudem eine frühzeitige Fehlererkennung bei der Workflow-Definition an sowie eine Anbindung benutzerdefinierter Operationen, welche die Robustheit und Effizienz der Workflows substantiell verbessern.

Kremer-Entwurf und Implementierung eines Workflow-Editors für die Geodatenverarbeitung-254_b.pdf


Evaluation of the HTAP Benchmark HyBench

Tom Lange

Technische Universität Dresden, Germany

This paper compares the performance of the database systems PostgreSQL and a research prototype of SAP HANA within the context of Hybrid Transactional/Analytical Processing (HTAP). The study includes a quantitative methodology using the newly proposed benchmark HyBench, which was developed specifically to assess the performance of HTAP databases. Our analysis revealed that the benchmark contains design and implementation flaws that lead to an inaccurate assessment of database performance. However, we present updated performance results for PostgreSQL and HANA and demonstrate how to systematically overcome these shortcomings.

Lange-Evaluation of the HTAP Benchmark HyBench-234_b.pdf


OPSC: Catching the “Oops” in JDBC PreparedStatements with Static Code Analysis

Thomas James Kirz

University of Passau, Germany

We introduce the Optional Prepared Statement Checker (OPSC, "OOP-see"), a novel type checker designed to identify errors in the use of JDBC. OPSC bridges the gap between SQL and Java types by statically verifying that the correct Java types are used when sending input parameters or retrieving results from a database. This allows OPSC to detect type mismatches at compile time, preventing issues that might otherwise surface as runtime errors in production. Unlike the Java compiler, which is limited to checking Java code, OPSC can also uncover issues not detectable by existing tools. Our early prototype successfully checks JDBC code across multiple Java projects from textbooks, identifying type mismatches that other state-of-the-art tools cannot detect.

Kirz-OPSC Catching the “Oops” in JDBC PreparedStatements with Static Code-249_b.pdf


Robust Plan Selection using Cardinality Distribution Models

Jonas Fan-cheng Meng

University of Konstanz, Germany

Efficient query optimization requires an accurate estimation of cardinalities. Various techniques to improve cardinality estimation all share a common limitation: by classic definition, cardinality estimates are single-point values that drive the generation and selection of candidate execution plans. Relying on a wrong estimate can significantly impact the efficiency of query execution and therefore overall database performance. Rarely has the uncertainty of estimators been assessed or incorporated into the plan selection. Rather than solely relying on a single value estimate, we seek to quantify the risk associated with using each estimate, aiming to avoid those suspected to be wrong, unreliable or generally not trustworthy. To this end, we reformulate the cardinality estimation problem, introduce a novel class of learned cardinality estimators fit for this refined formulation, and explore their application in a robust plan selection strategy by implementing a simple query optimizer.

Meng-Robust Plan Selection using Cardinality Distribution Models-236_b.pdf


SMART: Self-supervised Model aligning APIs and RDF using Transformers

Elena Nathalie Valette, Tobias Zeimetz, Henning Fernau, Ralf Schenkel

Universität Trier, Germany

Missing or unreliable entries are a constant challenge when querying data from databases. Traditionally, this challenge is solved semi-automated with high configuration effort by statistically estimating or ignoring the unreliable entry. If this is impossible, data curators have to manually search for the correct entry, taking into account the various naming conventions and storage structures of different sources like data dumps or APIs.

Focusing on the special case of RDF knowledge bases, we aim to avoid the time-consuming task of aligning API responses with the schema of the local knowledge base. Compared to data-dumps, APIs are more frequently available and their data is usually up-to-date.

We propose an automated approach that uses a self-supervised fine-tuned language transformer model to align API response structures with the schema of a RDF knowledge base.

In our experimental evaluation, the approach succeeded for 1:1 mappings between concepts that are distinguishable by their members towards other concepts in question and even detected some 1:N mappings. Our approach does not require familiarity with the API's output format and can be adapted to other types of KBs. This flexibility, together with the self-supervised learning technique, shows potential for further methods that want to refine a dataset without the involvement of a human being.

Valette-SMART Self-supervised Model aligning APIs and RDF using Transformers-225_b.pdf


Transactional YCSB: Benchmarking ACID-Compliant NoSQL Systems with Multi-Operation Transactions

Benedikt Maria Beckermann1,2

1Schloss Dagstuhl – Leibniz-Zentrum für Informatik; 2Universität Trier

NoSQL systems are popular because of their flexible data models and focus on availability, scalability, and fault tolerance. They often have loosened ACID guarantees to achieve these goals. To also support use cases that rely on ACID transactions, some existing NoSQL systems introduced ACID compliance after their release, and new NoSQL systems are created to support ACID compliance from the ground up.

A benchmark that supports transactions is required to compare the performance of these ACID-compliant NoSQL systems. The Yahoo! Cloud Serving Benchmark (YCSB) is the most used benchmark for NoSQL systems but does not support transactions. YCSB+T, an extension of YCSB, introduces support for transactions into the YCSB, but only for transactions consisting of a single operation. A further extension of YCSB is required to support the performance evaluation of workloads containing transactions that consist of multiple operations.

This paper introduces Transactional YCSB, an extension of the benchmarking framework YCSB that enables the evaluation of ACID-compliant NoSQL systems for workloads consisting of multi-operation transactions. Further, the paper evaluates the ACID-compliant NoSQL systems FoundationDB, MongoDB, and OrientDB using the developed YCSB extension.

Beckermann-Transactional YCSB-215_b.pdf


Achilles' SPEar: Using Metamorphic Testing to Find Bugs in Stream Processing Engines

Magnus Erk Kroner

Technische Universität Berlin, Germany

Stream Processing Engines (SPEs) are critical for real-time data processing, relying on aggressive optimizations to meet performance demands. Ensuring the reliability of such systems requires robust testing, yet testing remains costly and challenging due to the oracle problem. This paper investigates adapting query partitioning, a metamorphic testing technique, to the domain of SPEs. We present Achilles, an automated testing framework for SPEs. Achilles utilizes query partitioning to automatically generate, execute, and evaluate diverse test cases, reducing manual effort and targeting stream operators like filters and windowed aggregations. Our evaluation highlights Achilles' ability to detect unique bugs, including division-by-zero errors and predicate evaluation flaws. We further analyze framework parameters and demonstrate the effectiveness of predicate-based query partitioning, finding that increasing predicate depth boosts bug detection by 10%.

Kroner-Achilles SPEar-250_b.pdf


Benchmarking the RDF and Property Graph Model in the Temporal Dimension - A Case Study with Finbench

Theo Hahn1,2, Marvin Hofer1, Erhard Rahm1,2

1ScaDS.AI, Humboldtstraße 25, 04105 Leipzig, Germany; 2University Leipzig, Augustuspl. 10, 04109 Leipzig, Datenbanken, Germany

Temporal graph data is highly relevant for capturing the evolving nature of various domains, including financial transactions, social network interactions, supply chain logistics, disease outbreak modeling, and dynamic transportation systems. In this work, we benchmark the performance of temporal data representations in the property graph model (PGM) and RDF model using the FinBench transaction workload.

Our contributions are twofold. First, we adapt the FinBench data generator to produce RDF datasets in two formats: (1) the RDF-Reification representation and (2) the RDF-Star extension. Second, we provide queries for three new databases: Memgraph (PGM) and two RDF databases, Virtuoso and GraphDB, with only the latter offering RDF-Star support.

Our findings highlight key challenges in representing FinBench data in RDF, like missing functions of the SPARQL language in addressing certain query requirements. These insights provide valuable guidance for query writing and optimizing RDF-based representations for temporal graph workloads.

Hahn-Benchmarking the RDF and Property Graph Model in the Temporal Dimension-255_b.pdf


CheDDaR: Checking Data -- Data Quality Report

Indra Diestelkämper1, Ralf Diestelkämper2, Valerie Restat1

1University of Hagen; 2Flinkback GmbH

In the data-driven world, high data quality is critical for effective decision making and advanced applications such as machine learning. Data often suffer from low quality, necessitating thorough cleaning. In this work, we introduce CheDDaR, a framework for detecting data quality issues early with minimal effort using various data quality metrics and multiple verification methods. The goal of CheDDaR is to keep manual effort low while maximizing the scope of the test. While CheDDaR is extensible, it already comes with ten metrics and five verification methods, which serve to generate precise reports highlighting data quality problems. Its effectiveness is demonstrated through a test dataset, showing the framework’s capability to identify and assess data quality issues effectively when combining multiple verification methods.

Diestelkämper-CheDDaR-232_b.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: BTW 2025 Bamberg
Conference Software: ConfTool Pro 2.6.153+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany