Project: Data Lake Architecture

Corinna Giebler
LinkedIn
Project status: finished

Problem Statement

As enterprises shift their business to be data-driven and incorporate initiatives such as industry 4.0, data lakes become increasingly popular as data management platforms for heterogeneous data. However, at the time of this project, data lakes were a new and thus not mature concept with various opposing definitions and only high-level considerations regarding its realization. The guiding question of this research project therefore is as follows: How can a data lake be set up and realized to support the needs of an enterprise?

Solutions

To answer this question, a thorough research of data lake definitions and concepts was conducted. In parallel, discussions and interviews with various representatives of different business units and projects took place to identify needs, requirements, and current pain points. As a result of the literature research, we developed the Data Lake Architecture Framework, which structures the data lake implementation and offers guidance towards the development of a data lake architecture. Furthermore, we developed the Zone Reference Model, which is the organizational architecture at the core of the data lake. It describes how data are managed, how data flow in the data lake, and how users can access data. We also evaluated the use of Data Vault for data lakes, and created implementation patterns for the realization of zone architectures. Both the Zone Reference Model and the Data Lake Architecture Framework serve as a base for data lake development at Bosch.

Key Publications

  • Giebler C, Gröger C, Hoos E, Schwarz H, Mitschang B. Leveraging the Data Lake – Current State and Challenges. Proceedings of the 21st International Conference on Big Data Analytics and Knowledge Discovery (DaWaK 2019), 2019. https://doi.org/10.1007/978-3-030-27520-4_13.
  • Giebler C, Gröger C, Hoos E, Eichler R, Schwarz H, Mitschang B. Data Lakes auf den Grund gegangen. Datenbank-Spektrum 2020; 20:57–69. https://doi.org/10.1007/s13222-020-00332-0. (German)
  • Giebler, C. Gröger, E. Hoos, H. Schwarz, and B. Mitschang, “A Zone Reference Model for Enterprise-Grade Data Lake Management,” 2020. PDF
  • Giebler C, Gröger C, Hoos E, Eichler R, Schwarz H, Mitschang B. The Data Lake Architecture Framework: A Foundation for Building a Comprehensive Data Lake Architecture. Proceedings der 19. Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW 2021), 2021. PDF