Project: Data Platform Architectures & Technologies

Jan Schneider, M.Sc.
Email | Personal Website
Project status: running

Problem Statement

In recent years, enterprises of almost all industrial sectors have become subject to fundamental paradigm shifts: Large-scale projects, such as in the area of Industry 4.0, are driving the digital transformation and pursue to take a holistic view on value chains in order to enable cross-phase optimizations.

In order to keep up with this development and benefit from it, enterprises need to collect huge amounts of heterogeneous data across the entire value chain, organize it in a structured and re-usable manner and exploit it by applying data-driven analysis techniques to gain insights and knowledge.

For storing and managing the collected data, as well as to enable data preparation, processing and analytics applications, different types of data platforms have emerged in the past decades. They range from traditional data warehouses and the more recent data lakes to metadata management platforms such as data catalogs and enterprise data marketplaces, each serving different purposes. For many enterprises, this results in a large, diverse landscape of data platforms with complex architectures and further shortcomings, such as redundant storage of data and slow analytical processes. Hence, efforts to simplify architectures and technology stacks have recently become apparent, driven by novel approaches such as the delta architecture and lakehouse frameworks.

As the range of data platforms is rapidly evolving, the goal of this research project is to investigate and prototype upcoming architectures and technologies and to assess their applicability and potentials for industrial enterprises.

Solutions

The project is currently in its starting phase. So far, a literature review regarding existing and upcoming manifestations of data platforms and their architectures was conducted in order to obtain an overview about the current data platforms landscape. At this point, novel frameworks and technologies in the field of analytical data platforms are reviewed, categorized and evaluated using previously defined criteria. This assessment provides insights with respect to current trends, the addressed use cases and how industrial companies can benefit from them. Next, potential research gaps and future steps are supposed to be identified.

 Key Publications

  • Schneider, Jan; Lutsch, Arnold; Gröger, Christoph; Schwarz, Holger; Mitschang, Bernhard: First Experiences on the Application of Lakehouses in Industrial Practice. In: Proceedings of the 35th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), Herdecke, Germany, 2024. (to be published)
  • Schneider, Jan; Gröger, Christoph; Lutsch, Arnold; Schwarz, Holger; Mitschang, Bernhard: The Lakehouse: State of the Art on Concepts and Technologies. In: SN Computer Science, Springer Nature, 2024. (to be published)
  • Schneider, Jan; Gröger, Christoph; Lutsch, Arnold: The Data Platform Evolution: From Data Warehouses over Data Lakes to Lakehouses. In: Proceedings of the 34th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), Hirsau, Germany, 2023. (to be published)
  • Schneider, Jan; Gröger, Christoph; Lutsch, Arnold; Schwarz, Holger; Mitschang, Bernhard: Assessing the Lakehouse: Analysis, Requirements and Definition. In: Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS), Prague, Czech Republic, pp. 44 – 56, SciTePress, 2023. [DOI]
  • Schneider, Jan: The Data Platforms Landscape: An Overview (Poster).
    Presented at the 16th Symposium and Summer School On Service-Oriented Computing (SummerSOC), Crete, Greece, 2022. [PDF]