TEADAL

Trustworthy, Energy-Aware federated DAta Lakes along the computing continuum

Started at: 01-09-2022
Ends on: 31-08-2025

Budget: € 8 846 422.50

Areas: Distributed Artificial Intelligence (DAI)

Description:

TEADAL will enable the creation of trusted, verifiable, and energy-efficient data flows, both inside a data lake and across federated data lakes, based on a shared approach for defining, enforcing, and tracking data governance requirements with specific emphasis on privacy/confidentiality. The proposed stretched data lake, i.e., deployed in the continuum, will be based on an innovative control plane able to exploit all the controlled/owned resources across clouds and at the edge to improve data analysis.

The resulting capabilities of stretched data lakes also provide the basis for creating trustworthy mediatorless federations of data lakes to foster an effective data exchange among organizations while preserving privacy and confidentiality constraints without any imposed, often not acceptable, third-party coordinator. Finally, applying to data governance the principles of circular economy, i.e., to reuse data, application, and computation resources belonging to the data lake federation, will enable the creation of platforms for more sustainable data analytics.

Within the project, the i2CAT Foundation will actively participate in Work Packages 2, 3, 4 and 6. Work Package 2, “Pilot cases, requirements, and architecture”, will propose a technical foundation and an architecture on which the advances of the rest of the project can be built and will create a collection of synthetic datasets for each pilot case as the basis of experimentation and validation for the rest of the project. Work Package 3, “Gravity/friction-aware data lake privacy governance”, aims to enable the definition of the federation- and continuum-aware privacy requirements that will drive the data governance along horizontal and vertical directions, to provide an enhanced data catalogue capable of also expressing data gravity and data friction in the description of the dataset, and to identify and propose data transformation tooling able to support the privacy requirements fulfilment. Work Package 4, “Stretched data lakes”, will explore and validate the stretched data lake platform, demonstrating how to handle the inherent friction between being trustworthy (e.g., compliance) and usability (e.g., stretched, performant). It aims to demonstrate how to efficiently access, store, and manage data, make data accessible, and address security and privacy needs in the continuum (edge to cloud and multi-cloud) by providing a control plane for data usage that orchestrates the data flows in the continuum. Finally, Work Package 6, “WP title System integration, trials, and validation”, will focus on validating and showcasing the added value of all innovation actions carried out in WP3, WP4, and WP5.

Objectives:

  • Build efficient data lake solutions with ease of data handling across the computing continuum. TEADAL demonstrates a data lake control plane that handles the non-functional aspects of workloads across the computing continuum – automatically optimizes performance, enforces policies, runs transformations, and secures the data paths independently of the user code.
  • Construct trustworthy data lakes and mediator-less federation of data lakes. The TEADAL project enables the creation of trustworthy data lakes, where privacy/confidentiality requirements are satisfied when handling data along the continuum and when the data is shared among organizations.
  • Reduce the environmental impact of data analytics through an energy-efficient federation of stretched data lakes. The TEADAL project aims to apply the circular economy principle to the involved data (e.g., reducing data duplication, balancing data reuse and data accuracy, and reducing data movement), considering both design and operational levels.
  • Build privacy, organisational policies and a GDPR-compliant federation of stretched data lakes. The TEADAL project proposes shared knowledge on the exact semantics of privacy/confidentiality requirements and how they are enforced to avoid erroneous (different) interpretations that will break the trust established between the federated data lakes.
  • Contribute to and influence European research and initiatives to improve data sharing. The TEADAL project aims to gather feedback, explore problems beyond the ones identified by partners, and provide such stakeholders with concrete demonstrators of key innovations that the project will develop and disseminate the resulting innovative approaches to improve trustworthy data sharing in Europe.

Funded by the European Union. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or [name of the granting authority]. Neither the European Union nor the granting authority can be held responsible for them.

Consortium

TEADAL project has received funding from the European Commission programme Horizon Europe, under grant agreement number: 101070186.