21st Conference on Database Systems for
Business, Technology and Web (BTW 2025)
March 3 - 7, 2025 | Bamberg, Germany
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Session | ||||||||
DE4DS 2: Workshop on Data Engineering for Data Science 2
| ||||||||
Presentations | ||||||||
11:00am - 11:25am
SeeME: A General, Reusable Graph Schema for Data Preprocessing of Eye-Tracking Data University of Regensburg, Germany To track eye movement over time and to gain information about points of interest through fixation data, eye-tracking is used in a wide range of fields. In this paper, we present a general, reusable approach to store eye-tracking data and to realize data preprocessing tasks in-database. To achieve this, a graph databases graph schema for any eye-tracking data, consisting of 1) a time series data level and 2) a meta level is developed. Follow-up experiments or additional data like demographic data can easily be integrated into the meta level of the general schema. We use Neo4j to implement this general graph schema. To prepare the time series data for machine learning tasks we additionally present a modular in-graph-database preprocessing pipeline, empowering researchers to either compare different operators or select the best fitting one. For each preprocessing step Cypher code for at least two preprocessing algorithms for time series are at hand.
11:25am - 11:50am
Impact of Preprocessing on Classification Results of Eye-Tracking-Data 1Universität Regensburg, Germany; 2Hochschule Neu-Ulm Eye-tracking data provides valuable insights into human behavior, but its noisy and unstable nature necessitates robust preprocessing for accurate analysis. This study evaluates a tailored preprocessing pipeline designed to enhance machine learning classifier performance. Unlike prior research focusing on isolated preprocessing steps, this work systematically combines and compares techniques, including missing value imputation, outlier handling, and normalization, specifically optimized for eye-tracking data. The pipeline's impact is tested on classification accuracy, particularly in detecting academic dishonesty. By experimenting with diverse methods for handling missing data, outliers, and feature scaling, we assess their combined effects on classifier performance. A Random Forest classifier is utilized due to its proven effectiveness in prior studies \cite{nurwulan_random_2020}. This research not only builds on earlier findings but extends them by optimizing each preprocessing step. Results show a well-designed pipeline significantly enhances classification accuracy, offering insights into optimal preprocessing techniques for behavioral prediction tasks.
11:50am - 12:10pm
SQLinked - A Hybrid Approach for Local and Database-Remote Program Execution OTH Regensburg, Germany When working with today's relational databases, there is usually a clear boundary between the database server and the application, that interfaces with the database system using the query language SQL. The concept of stored procedures allows to move complex parts of the business logic into the database server for various reasons, as, for instance, to reduce the latency of ELT processes that involve several database queries building on each other like distributing records into tables according to their attribute values. Creating and maintaining such stored procedures can be a challenging task, however. The idea pursued in this paper is to create a programming language, as well as a compilation and execution environment that allows the user to mark parts of the application code for being automatically compiled to and later be executed as a stored procedure in the database instead of the execution environment of the actual application. This blurs the border between database and application and provides a natural and maintenance-friendly way for offloading latency sensitive parts of the code to the database system.
12:10pm - 12:25pm
Higher-Order SQL Lambda Functions University of Bamberg, Germany Model databases track the accuracy of models on pre-trained weights. The models are stored as executable code and extracted on deployment. Instead of extracting runnable code and data out of a database system, we propose higher-order SQL lambda functions for in-database execution. SQL lambda expressions have been introduced to let the user customise otherwise hard-coded data mining operators such as the distance function for k-means clustering. However, database systems parse lambda expressions during the semantic analysis, which does not allow for functions as arguments. This paper proposes higher-order lambda functions that support the execution of functions from a table as input. Higher-order lambda functions expressing machine learning models allow data scientists to monitor the qualities over time and thus eliminate the need for any extraction step. This paper presents the conception of higher-order lambda functions and their embedding into relational algebra using a derived map operator. We further present the current prototype implementation on top of relational database systems and present preliminary results for data mining within SQL.
|
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: BTW 2025 Bamberg |
Conference Software: ConfTool Pro 2.6.153+TC © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |