News & Insights

Death of the Data Lake: From Hoarding to Harnessing Data

17th September 2025By Richard Hutchings

Home » News & Insights » Death of the Data Lake: From Hoarding to Harnessing Data

For more than a decade, the mantra in data strategy and data management was simple: collect everything, store everything, and figure out value later.

Storage, after all, was cheap and easily scalable thanks to cloud platforms like Azure and AWS, and the rise of data analytics promised business leaders that all sorts of hidden insights were waiting for them in their data lakes (because surely, if you collect enough data, it’ll eventually tell you what to do, right?).

Just as the name suggests, ‘data lakes’ are vast, centralised repositories – a kind of digital dumping ground, if you will – where organisations store both structured and unstructured data in its raw form. There’s no need for cleaning or organising; everything from emails and spreadsheets to social media posts and customer interactions is simply poured in, ready to be dealt with later.

For years, it wasn’t uncommon to hear Boards and executives proudly touting the size of their lakes, convinced that somewhere within lay untold, untapped value, and that Big Data – and eventually AI – were the promised keys to unlocking it all.

The uncomfortable reality was, however, that most of these lakes had quietly become swamps; they were murky, disorganised, and hard to navigate. Instead of being engines of innovation, the data stored within was often incomplete, duplicative, poorly governed, or simply irrelevant. Sadly, I’ve seen organisations spend millions to maintain oceans of information that nobody trusts, nobody uses, and nobody even remembers collecting.

All this to say, then, it’s time we admit the truth: the age of indiscriminate data hoarding is over.

Why did the old model fail?

The logic of data hoarding made sense in its time. More data meant more chances to find patterns, more freedom for analysts, and more security in case the business might need it later. But that logic has collapsed for three reasons:

1. The costs have shifted

Yes, cloud storage is cheap. But the real costs aren’t in storage; they’re in:

Compliance and regulation: Every extra dataset becomes a liability under GDPR, CCPA, and new AI governance laws. Regulators don’t care if it’s unused since, if you hold it, you’re responsible for it.

Security and risk: Every forgotten dataset is another attack surface. The “just in case” data often lacks monitoring and controls, making it the most dangerous.

Operational complexity: Engineers spend disproportionate time cleaning, reconciling, and managing data that delivers little business value.

I’ve even spoken to leaders who spend 80% of their data budget protecting and governing data that contributes to 0% of their insights – a sobering reminder that the cost of hoarding isn’t just financial, it’s strategic. If a once forward-thinking investment is now draining resources, as senior stakeholders, you’re burying your ability to innovate and pivot.

2. Noise drowns out signal

The promise of the data lake was agility; there was a mentality of “we’ll collect first and decide later.” However, the reality is that analysts now spend most of their time wading through noise. Duplicate records, inconsistent formats, legacy logs … the sheer mass of irrelevant data slows data-led decision-making rather than enabling it.

Paradoxically, the more we hoard, the less we trust. I’ve seen business units rejecting centralised data because they find it unusable, opting instead to create parallel ‘shadow datasets’, just to get work done. Along with fragmenting governance, this undermines the very purpose of having a unified data strategy to begin with.

When teams stop trusting the lake, they stop using it. And when they stop using it, the lake becomes little more than a stagnant pool of risk, cost, and missed opportunity.

3. AI raises the stakes on quality

Since we’ve now entered the era of generative AI and advanced machine learning, there’s another uncomfortable truth to contend with: AI doesn’t make bad data better. It makes bad data bigger.

Feed an LLM incomplete, biased, or low-quality data, and it will confidently amplify errors at scale. We’ve all seen AI hallucinations. Now imagine those hallucinations driving financial forecasts, supply chain decisions, or compliance reporting. The consequences are embarrassing at best and, at worst, they’re operationally dangerous.

Remember, the more indiscriminate your data hoarding, the more fragile your AI becomes. Instead of unlocking intelligence, you risk automating ignorance. And the worst part is, AI will do so with speed, confidence, and a polished tone making it harder to spot until it’s too late.

The rise of purpose-driven data

It’s time to stop collecting data by default and start curating by purpose – a change I like to call ‘purpose-driven data’. Purpose-driven data asks a simple yet powerful question: what decision, outcome, or experience is this data meant to support?

Only when leaders know the answer to this question, can they decide whether it’s worth collecting, storing, and governing said data.

In practice, all purpose-driven data will share common traits, these include the following:

Curated over collected

Instead of dumping everything into a central repository, organisations should begin to define use-case-driven pipelines, e.g:

Marketing data curated to support customer journey analysis.
Product usage data curated to feed experimentation frameworks.
Risk data curated to satisfy compliance and audit requirements.

If a dataset doesn’t serve a defined use case, it doesn’t belong in the active environment. Archive it or delete it.

Governance as enablement, not bureaucracy

Traditional governance sometimes got a bad reputation as the ‘data police’. Purpose-driven data reframes governance as an enabler, however. Instead of asking: “How do we restrict access?”, then, we ask: “How do we make sure the right people get the right data, in the right context, to make better decisions?” This shift is profound because once governance stops being about locking data away, it starts being about aligning it with business outcomes.

Value metrics, not volume metrics

Organisations love to report on petabytes stored, pipelines ingested, or API calls processed, but none of those things measure value! Instead, purpose-driven data leaders ask:

What percentage of our data is actively used in decision-making?
What percentage of our AI models are trained on trusted, validated sources?
How much faster are we moving from raw data to business impact?

In this case it’s value, not volume, that becomes the north star.

Lessons from the front lines

Over the years, I’ve come across plenty of situations where organisations struggled with the weight of their own data, drowning in it before realising its true potential.

The following anonymised examples, drawn from real experiences, highlight how clarity can unlock real change:

A global retailer deleted nearly 60% of its historical clickstream data when it realised none of it was driving personalisation or revenue models. Governance shifted from retention to curation, and marketing performance actually improved.

A financial services firm realised its central data lake was a compliance nightmare. They pivoted to a purpose-driven model where each business line owned curated datasets aligned to regulatory reporting. The result: fewer fines, faster audits, and higher trust in the numbers.

A SaaS company introduced a “sunset policy” for data: if no clear use case was assigned within 12 months of collection, the data was archived or deleted. This freed up millions in storage and governance costs, forcing teams to be intentional.

The good news is these aren’t edge cases anymore. They’re becoming the new normal; a quiet revolution in how organisations think about data value, risk, and responsibility.

Leaders are beginning to ask not “how much data do we have?” but “what is this data doing for us?” In this new model, intentionality replaces accumulation, and curation becomes a competitive advantage.

The leadership challenge

Shifting to purpose-driven data requires leaders to move beyond the comfort of accumulation and start asking the hard questions:

Why are we keeping this?

Who’s using it?

What risk does it pose simply by existing?

What value would we lose if we let it go?

In my experience, the answers often surprise people. Most data is kept ‘just in case’, but the cost of this is becoming too high to ignore. What once felt like prudence now looks more like the avoidance of decision-making, accountability, and clarity.

To this end, I’ll leave you with a challenge I pose to every executive I work with:

If 80% of your company’s data disappeared tomorrow, would your business actually suffer?

If the answer is no, it’s time to rethink your strategy.

To find out how Littlefish can help guide your data strategy and advise on building a purpose-driven, outcome-focused data ecosystem, please get in touch.

17th September 2025By Richard Hutchings