21st Conference on Database Systems for
Business, Technology and Web (BTW 2025)
March 3 - 7, 2025 | Bamberg, Germany
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Session | ||||||
R7: Research 7: Machine Learning 2
| ||||||
Presentations | ||||||
9:00am - 9:20am
Incremental SliceLine for Iterative ML Model Debugging under Updates TU Berlin + BIFOLD, Germany SliceLine is a model debugging technique for finding the top-k worst slices (in terms of conjunctions of attributes) where a trained machine learning (ML) model performs significantly worse than on average. In contrast to other slice finding techniques, SliceLine introduced an intuitive scoring function, effective pruning strategies, and fast linear-algebra-based evaluation strategies. Together, SliceLine is able to find the exact top-K worst slices in the full lattice of possible conjunctions in reasonable time. Recently, we observe an increasing trend towards iterative algorithms that incrementally update the dataset (e.g., selecting samples, augmentation with new instances). Fully computing SliceLine from scratch for every update is unnecessarily wasteful. In this paper, we introduce an incremental problem formulation of SliceLine, new pruning strategies that leverage intermediates of previous slice finding runs on a modified dataset, and an extended linear-algebra-based enumeration algorithm. Our experiments show that incremental SliceLine yields robust performance improvements up to an order of magnitude faster than full SliceLine, while still allowing effective parallelization in local, distributed, and federated environments.
9:20am - 9:40am
Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning Leipzig University & ScaDS.AI Dresden/Leipzig, Germany Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in estimating and optimizing the utility-privacy trade-off in image datasets, helping to inform data and privacy modifications for better outcomes based on dataset characteristics.
|
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: BTW 2025 Bamberg |
Conference Software: ConfTool Pro 2.6.153+TC © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |