Project: Solution Method for Distributed Data Quality Management

Fabian Dzierzawa, M.Sc
Email | LinkedIn
Project status: running

Problem Statement

Enterprises are grappling with increasingly vast amounts of data, where data fuels transformative technologies like machine learning and data-driven products. However, amidst this surge, ensuring data quality has become paramount. Data arrives from myriad sources, in varying structures, and at unprecedented speeds. Maintaining data quality is essential for unlocking the potential of data-driven technologies and data-intensive business models, particularly in distributed environments, where data suppliers and consumers operate independently from the data provider.

Our research addresses this challenge by developing methodologies for robust data quality measures in distributed settings. Traditional approaches focus on the data itself or on ensuring data quality from a technical perspective. Instead, we advocate for a distributed model, empowering data providers to address challenges resulting from their independence from both the data source and the customer. Ensuring data quality involves considering the organizational characteristics that determine the applicability of existing methods and the potential usefulness of new approaches, while also leveraging innovative technical possibilities.

We are committed to navigating these complexities, paving the way for a future where data integrity and excellence are synonymous.

Key Publications

F Dzierzawa, D Petrik, K Stuber, S Merz, L Jaensch, G Herzwurm, Bad Data Quality Eats Ecosystem for Breakfast, Hawaii International Conference on System Sciences, 2025 (to be published)