Is your GDPR-restricted data toxic or clean?

by | Dec 3, 2019

Do your data lakes and warehouses contain personal information? Then you may have data that is toxic in the view of GDPR. If you have not obtained consent for every purpose that you plan to process data for, or haven’t anonymized the personal information, then under GDPR, your business has a significant exposure that could cost hundreds of millions.


When GDPR was implemented in May 2018, few businesses realized the impact it would have on data science and analytics. A year and a half in, the ramifications are indisputable. There have been more than €405 million in fines issued, and brands like British Airways have been irreparably harmed. Today, privacy infractions land on the front page, meaning data lakes pose a monumental threat to the longevity of your business.

The fact is, untold bounds of personal information is being collected, integrated, and stored in data lakes and data warehouses in almost every business. In many cases, this data is being stored for purposes beyond the original for which it was collected. 

In light of the new era of privacy regulations and legal compliance, most of the data sitting in data lakes and warehouses should be considered highly toxic for GDPR compliance.

Toxic data will result in regulatory penalties and a loss of consumer trust

GDPR-determined data controllers must establish a specific legal basis for each and every purpose where personal data is used. If a business intends to use customer data for an additional purpose, then it must first obtain explicit consent from the individual. 

As a result, all data in data lakes can only be made available for use after processes have been implemented to notify and request permission from every subject for every use case. This is impractical and unreasonable. Not only will it result in a mass of requests for data erasure, but it will slow and limit the benefits of data lakes. 

The risk is what we refer to as toxic data. This is identifiable data that you are processing in ways that you have not obtained consent for under GDPR. Left in a toxic state, your data lakes put your business at risk of fines worth 4% of your annual global revenue. 

Worse yet, the European DPA’s have been strict with their enforcement, leading to a flood of GDPR fines and a mass loss of customer confidence for many major data-driven companies. You need to act now before it is too late.

Anonymize your data to remove it from the scope of GDPR

Toxic data exposes your organization to significant business, operational, security, and compliance overheads and risks. Luckily, there is another way to clean your data lakes without undertaking the process of obtaining individual and meaningful consent: anonymize your data.

Rather than scramble to minimize data and update data inventory systems to comply, businesses should invest in automated defensible anonymization systems that can be implemented at an architectural point of control with regard to data lakes and warehouses.

Once data has been anonymized, it is no longer considered personal data. As such, it is no longer regulated by GDPR, and consent is not required to process it.

The impact to your business of using toxic data could be very damaging. If you want to leverage and monetize your data without risking violations and fines, you need to put it outside of the scope of GDPR. To do this, you need to decontaminate your data lakes.

Businesses essentially have two choices: 

(a) maintain the status quo and retain toxic information in data lakes and warehouses, or 

(b) anonymize your data using provable, automated, state-of-the-art solutions, so that GDPR is not applicable.

One option will save your brand reputation and bottom-line. The other is a mass of expensive regulatory complications and litigation exposures.

Join our newletter