21st Conference on Database Systems for
Business, Technology and Web (BTW 2025)
March 3 - 7, 2025 | Bamberg, Germany
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Session | |||||||||||||
R6: Research 6: Time Series and Use Cases
| |||||||||||||
Presentations | |||||||||||||
4:00pm - 4:20pm
Fast, Parameter-free Time Series Anomaly Detection Technische Universität Berlin, Germany Time series anomaly detection is a common problem across many domains. Despite the introduction of numerous algorithms leveraging deep learning, classical machine learning, and data mining techniques, no dominating approach has emerged. A common challenge is extensive parameter tuning and the high computational costs associated with many of these existing methods. To address this problem, this paper proposes a parameter-free anomaly detection algorithm, STAN (summary statistics ensemble). STAN applies a set of summary statistics over sliding windows and compares the results to the normal behavior learned during training. STAN flexibility allows for integrating different statistical aggregates, which effectively handle diverse types of anomalies. Our evaluation shows that STAN achieves a detection accuracy 60.4%, close to the widely used MERLIN algorithm (63.6%) while reducing execution time by more than an order of magnitude compared to all baselines.
4:20pm - 4:40pm
Relationship Discovery for Heterogeneous Time Series Integration: A Comparative Analysis for Industrial and Building Data Friedrich-Alexander-Universität Erlangen-Nürnberg, Computer Science 6 Cyber-physical systems like buildings and power plants are monitored with ever-increasing numbers of sensors, gathering massive and heterogeneous time-series datasets collected in data lakes. Appropriate meta-data, describing both the function and location of each sensor, is essential for any profitable use of the data but is often not available or incomplete. Particularly, information about related sensors, meaning sensors belonging to the same functional subsystem, might be hard to derive if appropriate meta-data is unavailable. While various approaches exist for automatic meta-data extraction from relational databases, the unique characteristics of heterogeneous time-series data necessitate specialized algorithms. Among the general algorithms developed for time-series meta-data inference, only a few are concerned with relationship discovery despite the critical importance of this information in many meta-data formats. Nevertheless, other domains offer a variety of measures for pairwise relationship discovery in homogeneous time-series collections. This paper consolidates these measures and evaluates their performance for identifying related but heterogeneous time series from the same functional subsystem within industrial facilities. We evaluate the methods on a collection of different datasets to extract promising relationship measures from the literature and show that there are other better-performing candidates than the common Pearson Correlation Coefficient.
4:40pm - 5:00pm
Caching Partition Identifiers for Fast Geometric Pattern Matching Universität Hamburg, Germany Pattern searching in large datasets is fundamental across scientific domains, as in bioinformatics or geographic information systems. As datasets grow and search complexity increases, these operations often result in long-running queries. While various approaches exist to improve query execution times, they face limitations when dealing with complex geometric patterns in comprehensive datasets. This paper presents a novel caching-based architecture combined with query decomposition to significantly reduce computation time for geometric pattern searches in large partitioned datasets. We focus on GeoMine, a software tool for searching protein-ligand complexes in life sciences, but demonstrate the approach's broader applicability. Our key innovation lies in recomposing queries into smaller subvariants, which are separately executed against the database. The results are stored as an index-like structure, allowing subsequent new queries to utilize this cache to reduce the search space to relevant partitions. We evaluated our approach on a lightweight variant of GeoMine and tested various storage backends, including RocksDB, Redis, and PostgreSQL. With this approach, we could substantially reduce the computation time for our use case. We further applied our approach to geospatial data using an OpenStreetMap dataset to demonstrate transferability and show its potential beyond its original bioinformatics use case. Our work contributes to more efficient data exploration in scientific research, accelerating queries on datasets often occurring in the scientific domain.
5:00pm - 5:20pm
Practical Problems in Customer Data - A Use-Case-Driven Classification 1Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany; 2Siemens Healthineers AG This article presents a comprehensive analysis of data quality issues encountered in customer data at large enterprises. This analysis is based on data collected at a large medical technology manufacturer, and the problems observed there are clustered into distinct classes. Through this classification, nine key prevention requirements can be identified which are essential for improving data fitness. These include changes to data governance and to data architecture, among others. An evaluation of existing tools against these requirements furthermore highlights notable solutions. Despite the availability of numerous tools, gaps remain, especially regarding integration of all functionalities. Our findings suggest that while industry-standard solutions are accessible, integrating them into a cohesive framework posed significant challenges in our use case, necessitating continual adjustments to data architecture and processes to enable and maintain high quality of data.
5:20pm - 5:30pm
Benchmark of n-Dimensional Array File Formats in Data Analytics Environments German Aerospace Center (DLR) - Institute of Data Science, Jena, Germany For effective data exchange and transfer, choosing the right file format is crucial. Different domains have specific standards for file formats. While CSV files are commonly used, they lack reusability. Data files are well-suited for computing clusters. Data analytics pipelines can be time-consuming due to handling large volumes of data. Timely data access is crucial for efficient processing and analysis. Earth system science (ESS) data commonly manifests as dense or sparse n-dimensional data. Dense n-dimensional data is conventionally stored in arrays, while sparse n-dimensional data is typically housed in data frames. In the realm of ESS, an array of file formats is leveraged for the storage of dense n-dimensional data, including NetCDF4, TileDB, and Zarr. The paper at hand aims to evaluate data file formats for retrieving multidimensional data, specifically focusing on tools within the ESS domain. The insights from this exploration will be applicable to other data analytics projects.
|
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: BTW 2025 Bamberg |
Conference Software: ConfTool Pro 2.6.153+TC © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |