
A typical data lake setup: Raw data is ingested from operations, then refined through pipelines for analytics and ML. 
Source: Starburst
Source: Starburst
Source: AWS 
This gave them a single database. However, as Zalando's data lake grew to petabytes in size, they ran into usual problems: managing data sharing across hundreds of teams, making sure backups and recovery were robust (S3 versioning was key here), and lowering big storage costs. They solved these by:



