Most organizations are violating CCPA’s de-identification regulations and don’t realize it
Our research clearly shows that 60% of data sets believed to be deidentified are not deidentified at all. If this remains the case in the CCPA era, businesses will put themselves at risk for class action lawsuits and brand and reputational damage.
Businesses need to address reality: first-generation privacy protection techniques are insufficient, and without a quantifiable privacy risk assessment, they have no way to assure defensible deidentification. The consequences for a lack of understanding are too great. Businesses need to invest in state-of-the-art privacy automation solutions – now.
What does the CCPA say about deidentification?
The CCPA transforms data from a commodity to a privilege, forever altering the way businesses approach consumer privacy. What’s more, its overheads – like verifiable consumer requests and data breach notifications – will prove restrictive in data science and analytics environments. However, the law does not mean doom for data-driven businesses. There are ways to take data out of scope for the CCPA.
The CCPA provides exemptions for data that has been defensibly deidentified. In fact, such data is no longer covered under the CCPA at all. This will enable businesses who deidentify any consumer data to use that data for lucrative secondary purposes without having to notify customers or offer thousands of data deletion opportunities. This makes the business incentive to deidentify data higher than ever.
However, the CCPA carries a high standard for data to be considered deidentified:
CCPA Clause 1798.140 (h): “Deidentified” means information that cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer, provided that a business that uses deidentified information:
(1) Has implemented technical safeguards that prohibit reidentification of the consumer to whom the information may pertain.
(2) Has implemented business processes that specifically prohibit reidentification of the information.
(3) Has implemented business processes to prevent inadvertent release of deidentified information.
(4) Makes no attempt to reidentify the information.
It is only when advanced privacy techniques are applied correctly, and a reidentification score is quantified, that deidentification can be proved to meet the legal requirement.
If data is not defensively deidentified, data remains subject to the CCPA and therefore at risk of class-action lawsuits, fines, and loss of consumer trust. This is critical to understand, as according to our research, 60% of data sets that are believed to be deidentified are not. While this may be primarily due to a lack of understanding or honest oversight, the CCPA does not accept belief as a measure for deidentification.
Deidentification: the illusion and the solution
Protecting consumer privacy is much more complex than removing personally identifiable information (PII). Other types of information, such as quasi-identifiers, can reidentify individuals or expose sensitive information when combined with other markers. The ability to link additional information and reidentify an individual through inference attacks and the mosaic effect is now well documented.
In fact, research at Carnegie Melon in 2000 using the US Census demonstrated that removing the direct identifiers left the data set 89% reidentifiable.
As a whole, first-generation data security methods of deidentification and manual approaches to assessing the risk of reidentification won’t cut it. Businesses must adopt an automated and defensible deidentification strategy to limit and prevent the reidentification of individuals in their data sets. This solution must include a reidentification risk score, because even if a business applies privacy-protection techniques, it still remains entirely uncertain if a data set has been effectively deidentified without quantifying the risk of reidentification.
As a result, when companies say that they have deidentified the data sets, the first question they need to answer is: How do you know that data cannot be reidentified? Being able to do so could be the difference between saving your brand and bottom-line.
The scale and the legal significance of proving privacy compliance under the CCPA are too great to take a “best-attempt” approach. In the CCPA era, how businesses handle personal information will define their risk exposure to legal actions and brand and reputational damage.