
A typical data lake setup: Raw data is ingested from operations, then refined through pipelines for analytics and ML.
Source: Starburst
Source: Starburst
Source: AWS
This gave them a single database. However, as Zalando's data lake grew to petabytes in size, they ran into usual problems: managing data sharing across hundreds of teams, making sure backups and recovery were robust (S3 versioning was key here), and lowering big storage costs. They solved these by:




We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.