What a Data Lake Is
A data lake is a large, low-cost store that holds raw data of many kinds — spreadsheets, logs, images, sensor readings — in its original form, ready to be processed later. It complements a structured warehouse rather than replacing it.
This article explains the idea and where a lake fits in a data strategy.
Lake vs Warehouse
A warehouse stores cleaned, structured data ready for reporting. A lake stores everything raw and unprocessed, so you keep your options open and pay very little to hold large volumes. Many organisations use both together.
Why a Lake Can Help
- Keeps raw data cheaply for future, as-yet-unknown uses.
- Handles formats a warehouse cannot, such as images or free text.
- Feeds machine-learning and analytics projects that need raw material.
Avoiding a 'Data Swamp'
Without governance, a lake becomes an unsearchable dumping ground. We add cataloguing, naming standards and access controls so the data stays discoverable and trustworthy.
Frequently Asked Questions
Do we need a data lake?
Only if you have large volumes of varied raw data or plans for machine learning. For straightforward reporting, a warehouse is usually enough.
If you need a hand with any of this, your Progressive Robot delivery team is ready to help. Raise a ticket from the Support area of your client portal or speak to your account manager and we will guide you through the next steps.