DuckLake - Subsecond Latency on a Petabyte
DuckLake is one of the most exciting technologies in data.
While data lakes are powerful, the formats that manage them have become notoriously difficult to work with.
“I think one of the things in DuckLake that we managed to do is to cut, I want to say like 15 technologies out of this stack.”
How does it achieve this? Instead of building a custom catalog server, DuckLake uses a simple, elegant idea: a standard database to manage metadata. It uses a database for what it’s good at. This clean architecture allows DuckLake to manage huge data lakes—with millions or even billions of files—across AWS S3 or Google Cloud Storage.
This simplicity also delivers incredible performance. In tests, DuckLake achieved sub-second query planning on a petabyte of data with 100 million snapshots—a scale that other systems can’t handle.
DuckLake speaks SQL, the lingua franca of data. Its architecture provides full ACID compliance, so concurrent reads and writes are handled seamlessly, allowing entire teams (and their AI agents) to work on the data lake simultaneously.
By returning to first principles, DuckLake delivers the power of a modern data lake without the complexity. Its simplicity and performance make it a vital part of the future of data.