21st Conference on Database Systems for
Business, Technology and Web (BTW 2025)
March 3 - 7, 2025 | Bamberg, Germany
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Session | ||||||||||||||||||||||||||||||||||
Demo: Demo Reception
| ||||||||||||||||||||||||||||||||||
Presentations | ||||||||||||||||||||||||||||||||||
RAGONITE: Iterative Retrieval on Induced Databases and Verbalized RDF for Conversational QA over KGs with RAG Fraunhofer IIS, Germany Conversational question answering (ConvQA) is a convenient means of searching over RDF knowledge graphs (KGs), where a prevalent approach is to translate natural language questions to SPARQL queries. However, SPARQL has certain shortcomings: (i) it is brittle for complex intents and conversational questions, and (ii) it is not suitable for more abstract needs. Instead, we propose a novel two-pronged system where we fuse: (i) SQL-query results over a database automatically derived from the KG, and (ii) text-search results over verbalizations of KG facts. Our pipeline supports iterative retrieval: when the results of any branch are found to be unsatisfactory, the system can automatically opt for further rounds. We put everything together in a retrieval augmented generation (RAG) setup, where an LLM generates a coherent response from accumulated search results. We demonstrate the superiority of our proposed system over several baselines on a knowledge graph of BMW automobiles.
Monitoring of Heterogeneos Datastores in Poly- and MultiStores University of Hamburg, Germany Due to the growing amount of vastly different datastores, there have been several attempts to hide the complexity of selecting and combining these stores behind a common interface. However, these Poly- and MultiStores come with new challenges such as query planning, query optimisation and data placement. Given that polyglot systems typically do not have detailed information about system utilisation and data distributions at their disposal, we developed a monitoring system that is a first attempt to close this gap. Our system, presented in this demonstration, is able to measure system characteristics such es query execution times, memory consumption or the current number of connections for heterogeneous datastores. Furthermore, we added mechanisms to classify the distribution of attributes, discover functional dependencies and calculate selectivities for attribute values. During the on-site demonstration, we visualise the data collected by our monitoring system using a web interface in combination with carefully selected example data and example query workloads.
A Data Quality Dashboard for (Security) Knowledge Graphs 1Software Competence Center Hagenberg; 2Hasso Plattner Institute, Germany; 3LIMES Security GmbH Knowledge graphs play a crucial role in storing and reusing domain knowledge for data analytics. For example, knowledge graphs can be used to model security domain knowledge (such as technical standards) to support software architects in developing secure software systems. Assessing and assuring the quality of the data in these graphs is critical to enable trust in the use of this machine-readable domain knowledge, and to ensure high-quality results for downstream tasks that build on this knowledge. In this paper, we present a visual data quality dashboard, which allows domain experts to verify the quality of their domain knowledge graph along different dimensions. We demonstrate the use of the dashboard by means of a previously built security knowledge graph.
A Demonstration of Skyrise: A Serverless Query Processor Hasso Plattner Institute, University of Potsdam, Germany Data processing systems are increasingly being deployed in the cloud, because of the cost-effectiveness of short-term resource provisioning. In recent years, serverless cloud computing has embodied highly elastic resource pools. This elasticity has the potential to make cloud-based systems more cost-efficient, preventing resource over-provisioning and under-provisioning. In this paper, we demonstrate Skyrise, a serverless query processor for infrequent in-situ analytics on cold data in cloud storage. We highlight Skyrise's capabilities to run entirely on serverless compute resources and to complement them with virtual servers and cloud object storage.
A Showcase of LLMs in Action: SQL Generation from Natural Language (Demo Paper) OTH Regensburg, Germany Today, large language models are a very efficient tool for human-computer interaction using natural language. Chatbots like ChatGPT and their corresponding APIs can be used to solve a large variety of tasks that are provided in human-comprehensible sentences, for example generating SQL queries. Executing spoken SQL queries in a database system poses a challenge, because the various syntactical details of SQL are usually not provided verbally. Here, the LLM can help to augment the recognized raw query with the syntax elements needed for successful execution. Furthermore, the correct spelling of table and column names can be derived from the database schema provided in the LLM prompt. This work showcases four use cases in which LLMs assist in querying database systems: (1) A plugin for phpMyAdmin for voice-query input in natural language, (2) a chart generator, (3) an Alexa skill, and (4) a speech-controlled action game SQL Invaders.
Discovering Suitable Anonymization Techniques: A Privacy Toolbox for Data Experts 1Mercedes-Benz AG, Germany; 2University of Stuttgart, Germany Identifying the appropriate anonymization technique is a critical yet challenging task for developers, data scientists, and security practitioners. Our interactive toolbox addresses this challenge by providing a comprehensive overview of available anonymization techniques to assist privacy-conscious developers in selecting the right one for their specific use cases. The toolbox offers a hierarchical and classified overview of techniques, each detailed with meta-model information. It employs a modular approach, allowing techniques to be implemented and deployed independently. Additionally, it enables developers to evaluate these techniques on test datasets. Our toolbox allows for the easy addition of new categories and modules. This paper demonstrates the anonymization toolbox’s capabilities, simplifying the decision-making process in the Anonymization by Design cycle by ensuring overview, modularity, and flexibility.
Segmify: A Deep Learning-Based Interactive Tool for Real-Time Cell Segmentation and Morphological Analysis 1Fraunhofer ITEM, Germany; 2Goethe University Frankfurt, Germany Accurately segmenting cells in microscopy images is vital for biomedical research, yet remains challenging due to issues like overlapping cells, noise, and low contrast. We address these challenges with an interactive dashboard powered by a U-Net++ model. Our system integrates Contrast Limited Adaptive Histogram Equalization (CLAHE) for enhanced segmentation in low-contrast images, along with custom loss functions and data augmentation to boost accuracy. The tool supports both automated and manual segmentation, allowing real-time parameter adjustments and morphological analysis on the cellular level. This versatile platform provides researchers with the flexibility to adapt to diverse datasets, ensuring high-quality results.
Generating Federated REST API Servers RPTU Kaiserslautern-Landau, Germany The application of REST APIs for data transfer on the web is ubiquitous. In many cases, such APIs are described in the popular OpenAPI format that allows a standardized, human- and machine-readable description of provided services. In this paper, we propose the demonstration of a generator that produces a complete federated REST API server from a specification when multiple instances of compatible services are deployed. The resulting software not only provides an integrated view but is also able to apply optimizations to query processing while maintaining compatibility with the original API specification.
ReCLAIM: An Integrated Platform for Data on Nazi-Looted Cultural Assets Hasso Plattner Institute, Germany During the Second World War, the National Socialists looted cultural assets from people of Jewish descent. The looted assets were documented, first by the perpetrators and later by the Allies. The resulting archival artifacts are scattered across sources, complicating search and linking of the entries across those sources. The ReCLAIM platform collects, prepares, standardizes, and links archival data on Nazi-looted cultural assets from various sources. Designed for both provenance researchers and non-experts, it offers user-friendly features, such as intuitive full-text search, advanced search options for refined queries, and easy-to-use comparison options. Because the underlying data is only partially standardized and subject to errors from OCR digitization processes, a critical aspect of data preparation ensures that original values are preserved alongside processed counterparts. By streamlining access to this information, ReCLAIM aims to support provenance research and facilitate the study of cultural heritage affected by historical injustice.
Upcycling UnivIS: Discovering Study Planning 2.0 University of Bamberg, Germany Technical innovation in university settings often progresses slowly, with legacy systems frequently remaining in use for years. This can lead to outdated technologies that no longer meet user needs. In this demonstration, we explore the revival of such legacy systems through the example of UnivIS, a course administration system used at the University of Bamberg for over two decades. We introduce Baula, a digital study assistant that builds on UnivIS data and functionality, integrating them into a modern, user-friendly interface. By upcycling existing infrastructure, Baula addresses significant student concerns regarding usability and functionality. This demonstration illustrates how such legacy systems can be modernized to better align with contemporary user requirements, showcasing a model for innovation within a traditionally slow-moving environment.
Interactive specification and visualization of group movement patterns in urban traffic data streams 1University of Marburg, Germany; 2Technical University of Marburg, Germany The rapid growth of continuously generated spatio-temporal data streams has become the source for many innovative applications, e.g., in the context of smart cities. This demo addresses the challenge of detecting group patterns in spatio-temporal data streams derived from urban movement data, a problem that has received limited attention in the streaming context so far. Based on a group pattern operator known from spatio-temporal databases, it presents the first extension of the operator to data streams fully integrated into our JEPC event processing system. In order to support users of JEPC in specifying and optimizing queries, an interactive workflow is provided with a visual component for immediate feedback of group pattern queries. Our contributions are the workflow for online specification and visualization of spatio-temporal group pattern queries and a discussion of their efficient implementations. Moreover, the demo is based on a large-scale dataset of urban movement data revealing the benefits of group patterns for urban traffic analysis.
Disaggregated Pipeline Grouping LIVE Technische Universität Dresden, Germany The high need for largely scalable systems grows together with the amount of data that has to be processed. Traditional system approaches like scale-up or scale-out reach their limits more and more often. Disaggregated systems offer a solution with their very high scalability. However, they come at the cost of a lot of data transfer via network. We demonstrate a possibility to optimize for redundant data access through our pipeline grouping approach in disaggregated systems.
Plaquette: Visualizing Redundancies in Relational Data University of Passau, Germany Functional dependencies are a fundamental topic in database education, yet students often find the concept abstract. To foster understanding of functional dependencies that capture redundancies in relational data, we propose an educational software tool called Plaquette. This tool offers a novel visualization of redundancies by coloring cells that contain redundant data, where deeper hues indicate stronger redundancies. In analogy to plaque tests at the dentist's office, we refer to these redundancies as "plaque". In our demo, we begin by recapping the definition of functional dependencies, followed by introducing our concept of plaque. Demo participants are then invited to interact with Plaquette, and assess sample scenarios that showcase small examples of relations and functional dependencies. By swiping through scenarios with correct but also fake plaque on a tablet in Tinder-style interaction, participants can playfully improve their intuition for the concept of functional dependencies. Beyond contributing a new teaching tool, our ultimate goal is to assist data analysts explore redundancies in real-world data, using tools like Plaquette.
History-Based Active Learning Universität Augsburg, Germany In this demo, we will show how Active Learning (AL) can be used to establish and transfer classification information over partially/loosely related datasets, in particular fine-grained user roles on social media, with many and unbalanced classes, large number of data points as well as different internal structures or drifts over time. The key idea is to incorporate the history of learning steps into the tool, allowing us to analyze, restart, and modify the transfer. We also provide a rich visualization that allows the human oracle to interpret the most critical cases.
ReProVide: Query Optimisation and Near-Data Processing on Reconfigurable SoCs for Big Data Analysis Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany The goal of ReProVide is to provide novel hardware and optimisation techniques for scalable, high-performance processing of Big Data. The Programmable System-on-Chip (PSoC) architecture of ReProVide includes a reconfigurable FPGA for the support of hardware accelerators for various operators on relational and streaming data. Such PSoCs can be used to process data directly at the source, such as data from attached NVMes, using application-specific accelerators. For example, compute-intensive tasks such as JSON parsing can be offloaded to the hardware accelerators, reducing CPU load. In addition, reducing the volume of data at an early stage avoids unnecessary data movements, resulting in lower energy consumption. This demo illustrates the opportunities and benefits of hardware-reconfigurable, FPGA-based PSoCs for near-data processing. The demo allows users to run two queries and select which operations should be pushed onto the SoC for near-data hardware acceleration. From no acceleration to maximum acceleration, a 52x improvement in throughput and 67x lower energy consumption can be observed.
Incremental Stream Query Merging In Action 1BIFOLD; 2TU Berlin, Germany Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. However, they primarily focus on achieving high throughput and low latency for a single query. To deploy multiple queries, the users instead scale the infrastructure, executing each query in isolation. As a result, SPEs overlook potential data and computation-sharing opportunities among several long-running queries. As streaming queries are continuous and long-running, identifying sharing opportunities among newly arriving and existing queries can reduce resource utilization. This allows for deploying more queries without the need to scale the infrastructure. In this demonstration, we present Incremental Stream Query Merging (ISQM), an end-to-end solution to identify and maintain sharing among stream queries. We showcase six different types of sharing identification techniques and their impact on query optimization and execution time.
Compression in Main Memory Database Systems: Cost and Performance Trade-Offs of Workload-Driven Data Encoding Hasso Plattner Institute, Germany Automating physical design optimizations of database systems is challenging. Recent work on index selection or data compression has shown significant advantages of automated approaches. However, the impact on running systems is often hard to predict. Moreover, automated systems often lack the capabilities to help users understand the decisions taken. In this demonstration, we study the impact of optimal encoding configurations for in-memory database systems. We allow the user to set varying main memory budgets for which optimal encoding configurations are applied, as well as allow the user to manually configure the system. Effects on runtime performance and memory consumption can be directly observed. The user can further analyze the impact compression has on overall memory consumption and how compression ratios affect performance when the memory bandwidth is saturated.
|
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: BTW 2025 Bamberg |
Conference Software: ConfTool Pro 2.6.153+TC © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |