Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Diss: Dissertation Awards
Time:
Friday, 07/Mar/2025:
9:40am - 10:10am

Session Chair: Klaus Meyer-Wegener, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Location: WE5/00.022

Lecture Hall 1

Show help for 'Increase or decrease the abstract text size'
Presentations
9:40am - 9:50am

Modern Data Analytics in the Cloud Era

Steffen Kläbe

TU Ilmenau, Germany

Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database

engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions.

First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling.

Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements.

Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine.

Kläbe-Modern Data Analytics in the Cloud Era-179_a.pdf


9:50am - 10:00am

Scalable Data Management using GPUs with Fast Interconnects

Clemens Lutz

TU Berlin / BIFOLD, Germany

Modern database management systems (DBMSs) are tasked with analyzing terabytes of data, employing a rich set of relational and machine learning operators. To process data at large scales, research efforts have strived to leverage the high computational throughput and memory bandwidth of specialized co-processors such as graphics processing units (GPUs). However, scaling data management on GPUs is challenging because (1) the on-board memory of GPUs has too little capacity for storing large data volumes, while (2) the interconnect bandwidth is not sufficient for ad hoc transfers from main memory. Thus, data management on GPUs is limited by a data transfer bottleneck. In practice, CPUs process large-scale data faster than GPUs, reducing the utility of GPUs for DBMSs. In this thesis, we investigate how a new class of fast interconnects can address the data transfer bottleneck and scale GPU-enabled data management. Fast interconnects link GPU co-processors to a CPU with high bandwidth and cache-coherence. We apply our insights to process stateful and iterative algorithms out-of-core by the examples of a hash join and k-means clustering. We first analyze the hardware properties. Our experiments show that the high interconnect bandwidth enables the GPU to efficiently process large data sets stored in main memory. Furthermore, cache-coherence facilitates new DBMS designs that tightly integrate CPU and GPU via shared data structures and pageable memory allocations. However, irregular accesses from the GPU to main memory are not efficient. Thus, the operator state of, e.g., joins does not scale beyond the GPU memory capacity. We scale joins to a large state by contributing our new Triton join algorithm. Our main insight is that fast interconnects enable GPUs to efficiently spill the join state by partitioning data out-of-core. Thus, our Triton join breaks through the GPU memory capacity limit and increases throughput by up to 2.5× compared to a radix-partitioned join on the CPU. We scale k-means to large data sets by eliminating two key sources of overhead. In existing strategies, execution crosses from the GPU to the CPU on each iteration, which results in the cross-processing and multi-pass problems. In contrast, our solution requires only a single data pass per iteration and speeds-up throughput by up to 20×. Overall, GPU-enabled DBMSs are able to overcome the data transfer bottleneck by employing new out-of-core algorithms that take advantage of fast interconnects.

Lutz-Scalable Data Management using GPUs with Fast Interconnects-152_a.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: BTW 2025 Bamberg
Conference Software: ConfTool Pro 2.6.153+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany