Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
DE4DS 2: Workshop on Data Engineering for Data Science 2
Time:
Tuesday, 04/Mar/2025:
11:00am - 12:30pm

Session Chair: Marina Tropmann-Frick, HAW Hamburg
Location: WE5/00.022

Lecture Hall 1

Show help for 'Increase or decrease the abstract text size'
Presentations
11:00am - 11:25am

SeeME: A General, Reusable Graph Schema for Data Preprocessing of Eye-Tracking Data

Dominique Hausler, Jennifer Landes, Meike Klettke

University of Regensburg, Germany

To track eye movement over time and to gain information about points of interest through fixation data, eye-tracking is used in a wide range of fields. In this paper, we present a general, reusable approach to store eye-tracking data and to realize data preprocessing tasks in-database. To achieve this, a graph databases graph schema for any eye-tracking data, consisting of 1) a time series data level and 2) a meta level is developed. Follow-up experiments or additional data like demographic data can easily be integrated into the meta level of the general schema. We use Neo4j to implement this general graph schema. To prepare the time series data for machine learning tasks we additionally present a modular in-graph-database preprocessing pipeline, empowering researchers to either compare different operators or select the best fitting one. For each preprocessing step Cypher code for at least two preprocessing algorithms for time series are at hand.

Hausler-SeeME A General, Reusable Graph Schema for Data Preprocessing-218_a.pdf


11:25am - 11:50am

Impact of Preprocessing on Classification Results of Eye-Tracking-Data

Jennifer Landes1, Meike Klettke1, Sonja Köppl2

1Universität Regensburg, Germany; 2Hochschule Neu-Ulm

Eye-tracking data provides valuable insights into human behavior, but its noisy and unstable nature necessitates robust preprocessing for accurate analysis. This study evaluates a tailored preprocessing pipeline designed to enhance machine learning classifier performance. Unlike prior research focusing on isolated preprocessing steps, this work systematically combines and compares techniques, including missing value imputation, outlier handling, and normalization, specifically optimized for eye-tracking data. The pipeline's impact is tested on classification accuracy, particularly in detecting academic dishonesty. By experimenting with diverse methods for handling missing data, outliers, and feature scaling, we assess their combined effects on classifier performance. A Random Forest classifier is utilized due to its proven effectiveness in prior studies \cite{nurwulan_random_2020}. This research not only builds on earlier findings but extends them by optimizing each preprocessing step. Results show a well-designed pipeline significantly enhances classification accuracy, offering insights into optimal preprocessing techniques for behavioral prediction tasks.

Landes-Impact of Preprocessing on Classification Results-212_a.pdf


11:50am - 12:10pm

SQLinked - A Hybrid Approach for Local and Database-Remote Program Execution

Florian Heinz, Johannes Schildgen

OTH Regensburg, Germany

When working with today's relational databases, there is usually a clear boundary between the database server and the application, that interfaces with the database system using the query language SQL.

The concept of stored procedures allows to move complex parts of the business logic into the database server for various reasons, as, for instance, to reduce the latency of ELT processes that involve several database queries building on each other like distributing records into tables according to their attribute values.

Creating and maintaining such stored procedures can be a challenging task, however. The

idea pursued in this paper is to create a programming language, as well as a compilation and execution environment that allows the user to mark parts of the application code for being automatically compiled to and later be executed as a stored procedure in the database instead of the execution environment of the actual application. This blurs the border between database and application and provides a natural and maintenance-friendly way for offloading latency sensitive parts of the code to the database system.

Heinz-SQLinked - A Hybrid Approach for Local and Database-Remote Program Execution-230_a.pdf


12:10pm - 12:25pm

Higher-Order SQL Lambda Functions

Maximilian Emanuel Schüle

University of Bamberg, Germany

Model databases track the accuracy of models on pre-trained weights.

The models are stored as executable code and extracted on deployment.

Instead of extracting runnable code and data out of a database system, we propose higher-order SQL lambda functions for in-database execution.

SQL lambda expressions have been introduced to let the user customise otherwise hard-coded data mining operators such as the distance function for k-means clustering.

However, database systems parse lambda expressions during the semantic analysis, which does not allow for functions as arguments.

This paper proposes higher-order lambda functions that support the execution of functions from a table as input.

Higher-order lambda functions expressing machine learning models allow data scientists to monitor the qualities over time and thus eliminate the need for any extraction step.

This paper presents the conception of higher-order lambda functions and their embedding into relational algebra using a derived map operator.

We further present the current prototype implementation on top of relational database systems and present preliminary results for data mining within SQL.

Schüle-Higher-Order SQL Lambda Functions-267_a.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: BTW 2025 Bamberg
Conference Software: ConfTool Pro 2.6.153+TC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany