Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
R3: Research 3: Query Processing 2
Time:
Wednesday, 05/Mar/2025:
4:00pm - 5:10pm

Session Chair: Nicole Schweikardt, Humboldt-Universität zu Berlin
Location: WE5/00.022

Lecture Hall 1

Show help for 'Increase or decrease the abstract text size'
Presentations
4:00pm - 4:20pm

Optimizing Linearized Join Enumeration by Adapting to the Query Structure

Altan Birler1, Mihail Stoian2, Thomas Neumann1

1Technische Universität München; 2UTN

In database systems, join enumeration is a critical but NP-hard problem, especially challenging for queries with a large number of joins, as often found in graph workloads. To manage this complexity, some state-of-the-art methods reduce the search space by employing query graph linearization. However, even after linearization, join enumeration still has cubic runtime, which can be inefficient for very large queries. We propose an improved enumeration algorithm that dynamically adapts to the query graph structure, avoiding the generation of invalid or redundant plans. This reduces the time complexity for star queries from O(n^3) to O(n log n) and enables the optimization of general tree queries of thousands of joins within seconds.

Birler-Optimizing Linearized Join Enumeration by Adapting-147_b.pdf


4:20pm - 4:40pm

Adaptive sorting for large keys, strings, and database rows

Marius Kuhrt1, Bernhard Seeger1, Sebastian Wild1, Goetz Graefe2

1University of Marburg, Germany; 2Google, Madison, Wisconsin, USA

Sorting a database table may require expensive comparisons, e.g., due to column count or column types such as long or international strings. Therefore, optimizing the count and cost of comparisons is important.

Adaptive sort algorithms avoid comparisons by exploiting pre-existing order in the input. If $N$ keys happen to be sorted or reverse-sorted, $N-1$ comparisons suffice, in contrast to $\log_2(N!)$ comparisons in the expected and worst cases. Ideally, adaptive sorting ensures graceful degradation from the best case to the expected case, e.g., by merging sorted runs pre-existing in the input.

Adaptive sorting has proven successful for integer keys but not for large keys, e.g., when comparing symbols or characters in text strings, bytes or words in binary strings, or column values in database rows. On the other hand, using longest common prefixes or offset-value codes, sorting $N$ strings with $K$ characters, bytes, or columns can be limited to $N \times K$ comparisons. If those comparisons are the dominant cost of sorting, e.g., due to interpreted execution of predicates etc., the cost of sorting is linear in the input size and, in fact, equal to the cost of verifying a claimed and correct sort order.

By leveraging and combining a variety of techniques, some old and proven in practice, some new yet equally sound, we introduce sorting techniques that are efficient, scalable, and adaptive; that degrade gracefully from $N-1$ to $\log_2(N!)$ comparisons of strings or of database rows; and that guarantee at most $1.042 N \times K$ comparisons of bytes or of column values in the worst case. We offer algorithm analyses and experimental results to test our hypotheses and support our claims.

Kuhrt-Adaptive sorting for large keys, strings, and database rows-134_b.pdf


4:40pm - 5:00pm

FASTlabel: Making Supervised Query Optimizer Hinting Practical

Kira Thiessat, Dirk Habich, Wolfgang Lehner

TU Dresden, Germany

Many traditional query optimizers support setting query configuration hints to steer the optimization process. Using these hints efficiently to decrease analytical query runtimes has been successful for learned models. However, the most successful approaches use queries annotated with their best optimizer hint beforehand. Even though learned models can significantly decrease query runtimes, the query annotation overhead outweighs their results. Since most use cases cannot allow for significant time investment before deploying a solution, related approaches become impractical. In this paper, we delineate the shortcomings that emerge from the currently used approaches in hint set prediction. We identify that their labeling techniques and increase in the number of supported hints render them infeasible. To overcome that, we propose FASTlabel, a novel labeling algorithm tailored toward the supervised hint set prediction approach FASTgres, which efficiently traverses the hint set search space for each query while supporting multiple hints. We show that our labeling strategy reduces the amount of query-hint set combinations that must be evaluated substantially, solving the current exponential scaling issue. Additionally, our experimental evaluation shows that FASTlabel makes supervised hint set prediction feasible, reducing the inherent labeling overhead of supervised learning approaches substantially.

Thiessat-FASTlabel-107_b.pdf


5:00pm - 5:10pm

Lightweight Memory Access Monitoring for Dynamic Data Placement in Tiered Memory Systems

Alexander Baumstark1, Marcus Paradies2, Kai-Uwe Sattler1

1TU Ilmenau, Germany; 2LMU München, Germany

With a growing number of memory technologies (e.g., PMEM, HBM, CXL memory), hardware characteristics and data access patterns on all available memory tiers must be monitored to make effective data placement decisions.

Thus, memory management for optimal data placement across memory tiers becomes more CPU- and resource-demanding.

Data placement can be either OS-controlled, e.g., through transparent virtual memory page promotion & demotion across memory tiers; or application-controlled, e.g., through explicit data allocation to specific memory tiers.

However, OS-controlled data placement lacks awareness of workload resources, and application-controlled data placement results in additional overhead from managing access metrics and cost models.

To facilitate dynamic data placement in tiered memory systems, modern OSs already provide built-in support for lightweight access monitoring to specific memory regions (e.g., through the Data Access MONitoring (DAMON) framework of the Linux Kernel).

In this work, we investigate the applicability of the memory access monitoring framework DAMON in the context of database systems in tiered memory systems.

To assess the applicability of DAMON as a core building block for data placement, we integrate its monitoring mechanism and memory management into Poseidon, a high-performance graph database system.

In our experimental evaluation, we analyze the accuracy & overhead of DAMON through a set of microbenchmarks and by running end-to-end queries from a graph database benchmark against our DAMON-enabled Poseidon DBMS.

Our initial results demonstrate the potential of DAMON as a core building block for workload-driven, dynamic data placement in tiered memory systems.

Our experiments show a 3% overhead in execution time while achieving an accuracy of over 90% compared to actual access numbers.

Baumstark-Lightweight Memory Access Monitoring for Dynamic Data Placement-129_b.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: BTW 2025 Bamberg
Conference Software: ConfTool Pro 2.6.153+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany