What a Data Lake Is

What a Data Lake Is

A data lake is a large, low-cost store that holds raw data of many kinds — spreadsheets, logs, images, sensor readings — in its original form, ready to be processed later. It complements a structured warehouse rather than replacing it.

This article explains the idea and where a lake fits in a data strategy.

Lake vs Warehouse

A warehouse stores cleaned, structured data ready for reporting. A lake stores everything raw and unprocessed, so you keep your options open and pay very little to hold large volumes. Many organisations use both together.

Why a Lake Can Help

  • Keeps raw data cheaply for future, as-yet-unknown uses.
  • Handles formats a warehouse cannot, such as images or free text.
  • Feeds machine-learning and analytics projects that need raw material.

Avoiding a 'Data Swamp'

Without governance, a lake becomes an unsearchable dumping ground. We add cataloguing, naming standards and access controls so the data stays discoverable and trustworthy.

Frequently Asked Questions

Do we need a data lake?

Only if you have large volumes of varied raw data or plans for machine learning. For straightforward reporting, a warehouse is usually enough.

If you need a hand with any of this, your Progressive Robot delivery team is ready to help. Raise a ticket from the Support area of your client portal or speak to your account manager and we will guide you through the next steps.

Did you find this article useful?