Project: Solution Method for Distributed Data Quality Management

Carolin Mombrey, M.Sc
Email | LinkedIn
Project status: running

Problem Statement

Enterprises such as Bosch operate complex, distributed data platforms in which data is generated, processed and consumed across organisational units, systems and technical boundaries. In such environments, data quality issues often remain unnoticed for long periods of time or are detected only after hey have already caused downstream problems, manual rework or incorrect analyses. As a result, data quality management in practice is frequently reactive, resource-intensive and difficult to scale.

A key challenge in this context is the early identification of emerging data quality risks in large and heterogeneous data landscapes. While many data quality problems follow recurring patterns, weak signals and early indicators are often overlooked by existing rule-based checks. This is particularly critical in distributed settings, where data producers and cosumers operate independently and data is resued across multiple use cases with differing quality requirements.

Building on these challenges, the project aims to support a shift from predominantly reactive data quality management towards a more proactive and anticipatory approach. Instead of deriving data quality rules only after concrete errors or incidents have occured, the goal is to identify potential data quality issues at an early stage and to systematically derive or adapt data quality rules before problems manifest themselves in operational systems or business processes.

Current approaches typically translate observed data quality problems into static rules to prevent the same errors in the future. In contrast, this project focuses on detecting patterns, anomalies and weak signals within data, metadata and contextual information that indicate emerging data quality risks. By doing so, the project addresses the question of how potential data quality problems can be anticipated rather than merely corrected after the fact.

To enable this, the project explores the use or Large Language Models (LLMs) and LLM-based agent concepts as a central analytical component. These approaches offer the potential to jointly analyse heterogeneous data sources, existing data quality rules, historical incidents and contextual information from the data platform. LLM-based agents can support the identification of recurring problem patterns, the interpretation of anomalies and the formulation of actionable recommendations for data quality improvement.

Methodologically, the project follows a design-oriented research approach with a strong focus on practical applicability. Based on concrete Bosch use cases, prototype solutions will be developed and iteratively refinded in close collaboration with practitioners. The resulting artefacts are expected to include conceptual models, prototype implementations and guidelines that demonstrate how AI-based approaches can be integrated into existing data quality management processes at Bosch and similar industrial environments.

Key Publications

F. Dzierzawa, D. Petrik, K. Stuber, S. Merz, L. Jaensch, G. Herzwurm, “Bad Data Quality Eats Ecosystem for Breakfast”, in Proceedings of the 58th Hawaii International Conference on System Sciences, 2025, https://hdl.handle.net/10125/109327

D. Petrik, F. Dzierzawa and K. Warthmann, “A Maturity Model for Digital Product Passports: A Design Science Study”, in IEEE Access, vol. 13, pp. 114575-114594, 2025, https://doi.org/10.1109/ACCESS.2025.3584842