Why privacy automation is the only route to CCPA de-identification compliance

Why privacy automation is the only route to CCPA de-identification compliance

The volume and variety of big data is surpassing the functionality of traditional privacy management. With the California Consumer Privacy Act (CCPA) coming into effect on January 1, 2020, it is more critical than ever for every organization operating in California to make real changes in how they manage their data. The only viable solution is privacy automation.

Traditional data privacy management approaches are slow, unscalable, and imperfect

Across organizations, data drives results. Yet the velocity at which data is growing threatens to turn this “new oil” from a profit-driver to fine-magnifier. 

Organizations are continuously collecting data in massive volumes, while data consumers utilize that information to perform their day to day jobs. This ceaseless cycle of data acquisition and analysis makes it almost impossible for organizations to monitor and manage all their data.

Yet today, data privacy management is often performed manually, with a survey-based approach. These processes do not scale. Not only are they unreliable, but manual implementation slows down data analysis and has made it impossible to stay current with privacy regulations. On top of this, first-generation techniques such as encryption, masking and hashing no longer cut it. In consequence, privacy and compliance teams are seen to be preventing companies from unlocking their most valuable resource. 

In reality, compliance is impossible with manual human review. It would be like cutting your lawn with a pair of scissors. 

Privacy compliance requires a unified effort from the various departments and privacy-related stakeholders within an organization. This requires the right tools and processes.

Now, with the CCPA coming into effect on January 1, 2020, organizations are being put to the test. For the first time, enterprises with operations in California will be held accountable to strict privacy regulations. There is an urgent need to build a manageable and effective data privacy strategy.

Under the CCPA, personal data cannot be used for secondary purposes unless explicit notice and the opportunity to opt-out has been provided from each user. These secondary purposes, like data science and monetization, are what makes data so valuable – why risk opt-outs?

If data has been de-identified or aggregated, it is no longer restricted. However, the standards for data classification as “de-identified or aggregated” are extremely high, and traditional methods of anonymization, like tokenization and hashing, will not cut it. It is only when advanced privacy techniques (differential privacy, k-anonymization) are applied correctly that data science and monetization can continue.

As a result, the complex structures of the average organization require a single enterprise-wide, end-to-end, automated solution to meet data and privacy compliance regulations: Privacy Automation.

 

Privacy automation: the only tool that can ensure CCPA compliance

Privacy automation assesses, quantifies and assures privacy by measuring the risk of identification, applying privacy-protection techniques, and providing audit reports throughout the whole process. With AI and a combination of the most advanced privacy techniques, this solution will simplify the compliance process and allow for privacy rules definition, risk assessments, application of privacy actions, and compliance reporting to happen within a single application. This process is part of what is known as Privacy by Design and Privacy by Default.

With Privacy Automation, metadata classification becomes possible. This lets you generate an automated and easy-to-understand privacy risk score.

Automation extends enterprise-wide, harmonizing the needs of Risk and Compliance and data science teams, and ensuring regulations are abided. This allows companies to unlock data in a manner that protects and adds value to consumers in a safer method than manual privacy-protection.

With privacy automation, enterprises can leverage state-of-the-art solutions to innovate without limitation or fear. In consequence, it is the only tool that will realistically enable enterprises to become CCPA-compliant by January 2020.

For more information, read our blog, The Business Incentives to Automate Privacy Compliance Under CCPA.

Join our newsletter



Automated risk assessment tools are the future of data compliance

Automated risk assessment tools are the future of data compliance

Privacy regulations such as GDPR, CCPA, and LGPD are requiring organizations to acquire consent in order to use their customers’ data for any purpose beyond the narrow one for which it was originally collected. Unless that data has been anonymized.

How do organizations know if their data has been properly anonymized, and how do they prove it?


These two questions present a huge burden for enterprises, and answering them properly means implementing significant changes in the way they have been doing business. No longer can they process data internally, or release it for third-party use, without explicit consent. This is a huge and potentially paralysing change. 

The first step that organizations need to take is to analyze their data to assess the risk of re-identification. They should know, beyond all doubt, the probability that their data could lead to the exposure of personally identifiable information. Once they have this knowledge, they can take appropriate actions to reduce the risk. The second step is to ensure that data that is de-identified retains analytical value, so that organizations can generate the insights they rely on for data science and data analytics. 

But for many organizations, this process could take a long time, and cause a loss of significant revenue and competitive advantage. Having the ability to automatically assess the risk of re-identification, apply privacy actions, and retain analytical value, will allow organizations to continue to grow and innovate – while remaining  compliant.

 

How AI-driven attribute tagging enables powerful risk assessment

In order to carry out proper risk assessment, you need your data to be correctly tagged. The attributes that must be tagged are direct identifiers and indirect or quasi-identifiers, both sensitive and insensitive. But tagging of data is a slow and time-consuming process. Automatic tagging greatly reduces costs, increases compliance, and allows organizations to stay ahead.

Artificial intelligence can really help here. A neural net, for example, can be trained to recognize direct and indirect identifiers.  Once the model is ready, it can be used to automatically tag your data. Better still, its understanding can evolve over time as your data changes.

Once the data is properly tagged, a risk assessment can occur that takes into account these attributes. That risk assessment can then provides a metric that an organization can utilize to decide on the appropriate privacy actions.

These privacy actions will reduce the risk of re-identification, but will also cause information loss. Therefore, these actions must consider the use of the data so that the right attributes retain the proper fidelity, while still reducing risk. The organization at this point can automate this process by recording the steps taken and then applying those same steps automatically for each additional dataset. Additionally, the actions can be different for different use cases and still enable an automatic process.

With these automated systems, an enterprise can implement “Privacy by Design.” Privacy regulations want to see this framework in business processes, in order to enforce compliance.  Adopting this approach will ensure that your organization is ready for the future.

Join our newsletter



Toxic Data is contaminating data lakes and data warehouses. How can you clean it up before it’s too late?

Toxic Data is contaminating data lakes and data warehouses. How can you clean it up before it’s too late?

Data is the new oil. Understandably, over the past few years, organizations have been gathering larger and larger quantities of it. However, a reckoning is on the way. New regulations such as CCPA mean that most of this data carries an inherent risk, that could affect and disrupt organizations if not dealt with.

Toxic data lakes and data warehouses


Websites, apps, social media – they all form part of how organizations use the digital space to gather consumer information, and then use that information to generate better solutions and services. All of this information is being stored in data lakes and data warehouses. 

The big problem with storing all of this data is that the majority of it is personal information. And under the new privacy regulations, personal information has to be handled with special care. Mismanagement of this information opens the door to fines that could go up to nine digits, as well as to the loss of customer trust and revenue. 

In light of the new era of privacy regulations, most of the data sitting in data lakes and data warehouses is highly toxic.

Unfortunately, organizations are having a hard time measuring their privacy exposure and adopting processes and technologies to control and reduce risk. The toxicity of data lakes and warehouses keeps going up and is a ticking bomb waiting to explode.   

 

Decontaminating before it is too late

Data governance has been the traditional way in which organizations have tried to control the risk exposure of their data assets. However, traditional data governance needs to evolve to cover the rise of privacy risk. 

Modern-day data governance must contain the following elements to be able to clean the data lakes and warehouses:

  • Provide a comprehensive privacy risk measure: Reducing privacy risk without being able to measure the risk is like flying a plane without instruments. Organizations need to be able to measure their privacy risk exposure as well as understand how each data consumer impacts this risk.

     

  • Privacy enhanced data discovery and classification: In order to measure and reduce privacy risk, organizations need to know what data they have. This discovery and classification need to incorporate privacy terminology to be effective in measuring privacy risk.

     

  • Variety of privacy-preserving techniques: Reducing privacy risk requires an understanding of how the data’s analytical value gets degraded. Utilising a variety of privacy techniques, like differential privacy and k-anonymity, allows organizations to reduce privacy risk while preserving analytical value.

     

  • Automatic policy enforcement: Making sure that the data that is coming in and out of the data lakes and warehouses is a huge endeavour that can’t be done manually. Organizations need systems that support and automate policy enforcement.

     

  • Data governance reports: Knowing exactly who accessed what data is a must for any data governance process.

     

Cleaning your data lake and warehouse from toxic data is possible as long as you implement data governance tools that are suited for understanding and managing the privacy risk inherent in your data assets. 

Subscribe to our newsletter


Approach CCPA Amendments as a Competitive Advantage, Not a Compliance Overhead

Approach CCPA Amendments as a Competitive Advantage, Not a Compliance Overhead

Following CCPA Amendments, find a practical guide to understanding the business advantages to de-identified data and leveraging privacy risk advantages for data driven organizations.  

“Separate your ‘front end’ and ‘back end’ into two separate streams of CCPA compliance work”
“Taking data lakes and warehouses out of scope for CCPA”
“Approach CCPA as a competitive advantage rather than a compliance overhead” for your back end compliance requirements.”

This blog will summarise the amendments and clarification relating to ‘de-identified data’ and then will focus on the business advantages to implement more automated ‘state-of-the-art’ tools as part of the CCPA organisational and technical controls requirements to meet the CCPA legal specifications of de-identified data.

The verdict is in: Only five CCPA amendments made it through the California legislature.  These amendments are limited in scope. They make only incremental changes to the CCPA – and, in some cases, expand the private right of action for consumers. They do not fundamentally change the fact that the CCPA will impose substantial new compliance obligations on companies. As expected, a largely intact CCPA will come into effect on January 1, 2020. 

Organizations that will be affected by CCPA can no longer justify delaying, or adopting a wait-and-see policy toward potential further amendments. It was tempting for enterprises to use potential further clarifications as an excuse to put off real work toward becoming CCPA compliant. But time’s up. They need to initiate CCPA compliance programs, and start implementing the necessary organisational and technical controls, today. 

With this being the case, organizations are understandably seeing CCPA as a compliance overhead and business restrictive, that brings additional costs and prevents it from doing business in the way they’re used to.

But here’s the good news: Viewed the right way, CCPA can be not only a compliance overhead, but also a competitive advantage.

 

How can enterprises turn CCPA amendments into an advantage?


All sensible companies should be ensuring they can meet the new CCPA obligations, particularly obligations that may require more significant lead time. They should be implementing the organisational and technical controls required to meet the finer points of compliance: Right to know, right to erasure, right to be forgotten, and so on.

But they should also be seeking to gain the advantages that CCPA will bring. 

Let’s break this down. The key here is back-end uses of consumer information. CCPA places restrictions on how and why a company can use consumer data beyond the primary purpose for which it was originally collected. Most modern organisations are heavily data-driven, and they leverage data science and data analytics tools and environments. If they aren’t careful, they will find that their data science and data analytics projects are heavily impacted by CCPA.

However, if they approach CCPA compliance correctly, organizations can continue to reap the benefits of their data science and data analytics projects in which they have already invested heavily. They can do this by properly de-identifying their data so it falls out of scope for CCPA. Now, CCPA compliance becomes a business advantage, not a compliance overhead.

Remember CCPA’s Disclosure Requirements: At or before the point of collection, businesses must inform consumers of the categories and specific pieces of personal information they are collecting; the sources from which that information is collected; the purpose of collecting or selling such personal information; the categories of personal information sold; and the categories of third parties with whom the personal information is shared. 

In light of the recent CCPA amendments, here are three areas where organisations can comply with the CCPA Disclosure Requirements, and thus gain an advantage, by ensuring that the data in their data science and data analytics environments is de-identified:

  • Definitions of “personal information” and “publicly available information.”
  • Exemption for business customers and clarification on de-identified information.
  • Data breach notification requirements and scope

 

Definitions of “personal information” and “publicly available information” – AB874


AB874 includes several helpful clarifications with respect to the scope of “personal information” regulated under CCPA. Previously, “personal information” was defined as including all information that “identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.”  

The amended definition of “personal information” clarifies that information must be “reasonably capable of being associated with” a particular consumer or household. Separately, the bill clarifies that “publicly available information” means information that is lawfully made available from federal, state, or local records, regardless of whether the data is used for a purpose that is compatible with the purpose for which the data was made publicly available. Further, the bill revises the definition of “personal information” to clarify that it does not include de-identified or aggregate information.

 

Exemption for business customers and clarification on de-identified information – AB1335


AB1335 clarifies that the CCPA’s private right of action does not apply if personal information is either encrypted or redacted. It also makes certain technical corrections, including revising the exemption for activities involving consumer reports that are regulated under the Fair Credit Reporting Act, and clarifying that de-identified or aggregate consumer information is excluded from the definition of “personal information.”

 

Data breach notification requirements – AB1130 


AB1130 clarifies the definition of “personal information” under California’s data breach notification law as including biometric data (such as “a fingerprint, retina, or iris image”), tax identification numbers, passport numbers, military identification numbers, and unique identification numbers issued on a government document. 

Additionally, there is a significant gem hidden in the detail, clarifying CCPA Section 1798.150: Class-action lawsuits may not be brought for data breaches when “data breach personal information” is either encrypted or redacted (not both); and de-identified and aggregate information are exempt from the statute.

 

Making the CCPA amendments work for you


These amendments clarify a broader truth about CCPA: It is imperative that organizations establish controls to prove that personal information can be transformed to meet the CCPA legal specifications for de-identified data. This is the only way that the business advantages that accrue from data science and data analytics can continue to accrue.  

Under the CCPA, information is only de-identified if it “cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer.” In addition, the business using the data must adopt technical and procedural safeguards to prevent its re-identification, have business processes to prohibit re-identification, and must not make any attempt to re-identify the data. Businesses may today view information as “de-identified” even when information relates to a specific-but-unidentified individual.

The clock is ticking. All businesses that want to continue to maximize their data science and data analytics need to start moving toward meeting this de-identification standard. Here are some immediate questions that every organization should be asking themselves.

  1. Do you know what information your company holds on its consumers in your data lakes and data warehouses?  
  2. Do you understand each and every purpose for which you are holding and processing consumer data?
  3. Are you profiling or aggregating consumer data in your data science and analytics projects for an additional purpose beyond what the data was first collected for?
  4. Are you combining or linking consumer data with other available information that increases the risks of identification of the consumer?
  5. Do you have a need to re-identify consumer data that was de-identified or aggregated?
  6. Are you sharing or selling your customer data?
  7. Are your data protection, encryption and redaction methods sufficient to prevent the risks of re-identification, and to meet the CCPA legal specification of de-identified data?

 

Practical state-of-the-art automated approaches for the back-end of CCPA Compliance


Data discovery and classification projects can do a lot for front-end CCPA compliance. But newer, state-of-the-art, automated solutions are the best answer to CCPA back-end compliance. These approaches leverage ML techniques that can effectively perform automated and instant metadata classification on your structured data; help you instantly understand aspects of the data relating to CCPA compliance requirements; and identify the risks of re-identification in your data science and data analytics environments. 

These automated and instant metadata processes, combined with a systems-based understanding and knowledge of the risk of re-identification, enable you to transform your data to meet the legal specifications of ‘de-identified’ information. This happens by applying modern and integrated privacy protection actions such as generalisations, hierarchies, redactions, and differential privacy to ensure that the data remains de-identified but still retains the data utility and analytical value for your data science and data analytics projects.

Join our newsletter



The Consequences of Data Mishandling: Twitter, TransUnion, and WhatsApp

The Consequences of Data Mishandling: Twitter, TransUnion, and WhatsApp

Who should you trust? This week highlights the personal privacy risks and organizational consequences when data is mishandled or utilized against the best interest of the account holder. Twitter provides advertisers with user phone numbers that had been used for two-factor authentication, 37,000 Canadians’ personal information is leaked in a TransUnion cybersecurity attack, and a GDPR-related investigation into Facebook and Twitter threatens billions in fines.
Twitter shared your phone number with advertisers.

Early this week, Twitter admitted to using the phone numbers of users, which had been provided for two-factor authentication, to help profile users and target ads. This allowed the company to create “Tailored Audiences,” an industry-standard product that enables “advertisers to target ads to customers based on the advertiser’s own marketing lists.” In other words, the profiles in the marketing list an advertiser uploaded were matched to Twitter’s user list with the phone numbers users provided for security purposes.

When users provided their phone numbers to enhance account security, they never realized that this would be the tradeoff. This manipulative approach to gaining user-information raises questions over Twitter’s data privacy protocols. Moreover, the fact that they provided this confidential information to advertisers should leave you wondering what other information is made available to business partners and how (Source). 

Curiously, after realizing what happened, rather than come forward, the company rushed to hire Ads Policy Specialists to look into the problem. 

On September 17, the company “addressed an “error” that allowed advertisers to target users based on phone numbers.” (Source) That same day, they then posted a job advertisement for someone to train internal Twitter employees on ad policies, and to join a team working on re-evaluating their advertising products.

Now, nearly a month later, Twitter has publicly admitted their mistake and said they are unsure how many users were affected. While they insist no personal data was shared externally, and are clearly taking steps to ensure this doesn’t occur again, is it too late?

Third-Party Attacks: How Valid Login Credentials Led to Banking Information Exposure 

A cybersecurity breach at TransUnion highlights the rapidly increasing threat of third party attacks and the challenge to prevent them. The personal data of 37,000 Canadians was compromised when legitimate business customer’s login credentials were used illegally to harvest TransUnion data. This includes their name, date of birth, current and past home addresses, credit and loan obligation, and repayment history. While this may not include information on bank account numbers, social insurance numbers may also have been at risk. This compromise occurred between June 28 and July 11 but was not detected until August (Source).

While alarming, these attacks are very frequent, accounting for around 25% of cyberattacks in the past year. Daniel Tobok, CEO of Cytelligence Inc. reports that the threat of third party attacks is increasing, as more than ever, criminals are using the accounts of trusted third parties (customers, vendors) to gain access to their targets’ data. This method of entry is hard to detect due to the nature of the actions taken. In fact, often the attackers are simulating the typical actions taken by the users. In this case, the credentials for the leading division of Canadian Western Bank were used to login and access the credit information of nearly 40,000 Canadians, an action that is not atypical of the bank’s regular activities (Source).

Cybersecurity attacks like this are what has caused the rise on two-factor authentication, which looks to enhance security -perhaps in every case other than Twitter’s. However, if companies only invest in hardware, they only solve half the issue, for the human side of cybersecurity is a much more serious threat than often acknowledged or considered. “As an attacker, you always attack the weakest link, and in a lot of cases unfortunately the weakest link is in front of the keyboard.” (Source)

 

Hefty fines loom over Twitter and Facebook as the Irish DPC closes their investigation.

The Data Protection Commission (DPC) in Ireland has recently finished an investigation into Facebook’s WhatsApp and Twitter over breaches to GDPR (Source). These investigations looked into whether or not WhatsApp provided information about the app’s services in a transparent manner to both users and non-users, and about a Twitter data breach notification in January 2019.

Now, these cases have moved onto the decision-making phase, and the companies are now at risk of a fine up to 4% of their global annual revenue. This means Facebook could expect to pay more than $2 billion.

This decision moves to Helen Dixon, Ireland’s chief data regulator, and we expect to hear by the end of the year. These are landmark cases, as the first Irish legal proceedings connected to US companies since GDPR came into effect a little over a year ago (May 2018) (Source). Big tech companies are on edge about the verdict, as the Irish DPC plays the largest GDPR supervisory role over most big tech companies, due to the fact that many use Ireland as the base for their EU headquarters. What’s more, the DPC has opened dozens of investigations into other major tech companies, including Apple and Google, and perhaps the chief data regulator’s decision will signal more of what’s to come (Source).

In the end, it is clear that the businesses and the public must become more privacy-conscious, as between Twitter’s data mishandling, the TransUnion third-party attack, and the GDPR investigation coming to a close, it is clear that privacy is affecting everyday operations and lives.

Join our newsletter