CCPA is here. Are you compliant?

CCPA is here. Are you compliant?

As of January 1, 2020, the California Consumer Privacy Act (CCPA) came into effect and has already altered the ways companies can make use of user data. 

Before the CCPA implementation, Big Data companies had the opportunity to harvest user data and use it for data science, analytics, AI, and ML projects. Through this process, consumer data was monetized without protection for privacy. With the official introduction of the CCPA, companies now have no choice but to oblige or pay the price. Therefore begging the question; Is your company compliant?

CCPA Is Proving That Privacy is not a Commodity- It’s a Right

This legislation enforces that consumers are safe from companies selling their data for secondary purposes. Without explicit permission to use data, companies are unable to utilize said data.

User data is highly valuable for companies’ analytics or monetization initiatives. Thus, risking user opt-outs can be detrimental to a company’s progressing success. By de-identifying consumer data, companies can follow CCPA guidelines while maintaining high data quality. 

The CCPA does not come without a highly standardized ruleset for companies to satisfy de-identification. The law comes complete with specific definitions and detailed explanations of how to achieve its ideals. Despite these guidelines in place, and the legislation only just being put into effect, studies have found that only 8% of US businesses are CCPA compliant.  

For companies that are not CCPA compliant as of yet, the time to act is now. By thoroughly understanding the regulations put out by the CCPA, companies can protect their users while still benefiting from their data. 

To do so, companies must understand the significance of maintaining analytical value and the importance of adequately de-identified data. By not complying with CCPA, an organization is vulnerable to fines up to $7500 per incident, per violation, as well as individual consumer damages up to $750 per occurrence.

For perspective, after coming into effect in 2019, GDPR released that its fines impacted companies at an average of 4% of their annual revenue.

To ensure a CCPA fine is not coming your way, assess your current data privacy protection efforts to ensure that consumers:

  • are asked for direct consent to use their data
  • can opt-out or remove their data for analytical purposes
  • data is not re-identifiable

In essence, CCPA is not impeding a company’s ability to use, analyze, or monetize data. CCPA is enforcing that data is de-identified or aggregated, and done so to the standards that its legislation requires.

Our research found that 60% of datasets believed, by companies, to be de-identified, had a high re-identification risk. There are three methods to reduce the possibility of re-identification: 

  • Use state-of-the-art de-identification methods
  • Assess for the likelihood of re-identification
  • Implement controls, so data required for secondary purposes is CCPA compliant

Read more about these effective privacy automation methods in our blog, The business Incentives to Automate Privacy Compliance under CCPA.

Manual Methods of De-Identification Are Tools of The Past

A standard of compliance within CCPA legislation involves identifying which methods of de-identification leaves consumer data susceptible to re-identification. The manual way, which is extremely common, can leave room for re-identification. By doing so, companies are making themselves vulnerable to CCPA.

Protecting data to a company’s best abilities is achievable through techniques such as k-anonymity and differential privacy. However, applying manual methods is impractical for meeting the 30-day gracing period CCPA provides or in achieving high-quality data protection.

Understanding CCPA ensures that data is adequately de-identification and has removed risk, all while meeting all legal specifications.

Achieving CCPA regulations means ditching first-generation approaches to de-identification, and adopting privacy automation defers the possibility of re-identification. Using privacy automation as a method to protect and utilize consumer’s data is necessary for successfully maneuvering the new CCPA era. 

The solution of privacy automation ensures not only that user data is correctly de-identified, but that it maintains a high data quality. 

CryptoNumerics as the Privacy Automation Solution

Despite CCPA’s strict guidelines, the benefits of using analytics for data science and monetization are incredibly high. Therefore, reducing efforts to utilize data is a disservice to a company’s success.

Complying with CCPA legislation means determining which methods of de-identification leave consumer data susceptible to re-identification. Manual approach methods of de-identification including masking, or tokenization, leave room for improper anonymization. 

Here, Privacy Automation becomes necessary for an organization’s analytical tactics. 

Privacy automation abides CCPA while benefiting tools of data science and analytics. If a user’s data is de-identified to CCPA’s standards, conducting data analysis remains possible. 

Privacy automation revolves around assessment, quantification, and assurance of data. Simultaneously, a privacy automation tool measures the risk of re-identification, applying data privacy protection techniques, and providing audit reports. 

A study by PossibleNow indicated that 45% of companies are in the process of preparing, but had not expected to be compliant by the CCPA’s implementation date. Putting together a privacy automation tool to better process data and prepare for the new legislation is critical in a companies success with the CCPA. Privacy automation products such as CN-Protect allow companies to succeed in data protection while benefiting from the data’s analytics. (Learn more about CN-Protect)

Join our newsletter

The top 4 privacy solutions destroy data value and fail to meet regulatory standards.

The top 4 privacy solutions destroy data value and fail to meet regulatory standards.

Businesses are becoming increasingly reliant on data to make decisions and learn about the market. Yet, due to an increase in regulations, the information they have collected is becoming less and less useful. While people have been quick to blame privacy laws, in reality, the biggest impediment to analytics and data science are insufficient data privacy solutions.

From our market research, the top four things people are doing are (1) access controls, (2) masking, (3) encryption, and (4) tokenization. While these solutions are a step in the right direction, they wipe the data of its value and leave businesses open to regulatory penalties and reputational damage.

Your data privacy solutions are insufficient

Access controls: Access controls limit who can access data. While important, they are just not an effective privacy-preserving strategy because the controls do not protect the identity of the individuals or prevent their data from being used for purposes they have not consented to. It is a an all-or-nothing approach, whereby someone has access to the data, and privacy is not protected, or not, in which case, no insights can be gleaned at all.

Masking: This is a process by which sensitive information is replaced with synthetic data. In doing so, the analytical value is wiped. While this solution works for testing, it is not an advantageous solution if you are planning to provide the data to data scientists. After all, you are sending them this data to unlock valuable insights!

Encryption: Encryption is a security mechanism that protects data until it is used. At which point, the data is decrypted, exposing the private data to the user. Additionally, the concern with encryption, is that if someone accesses the key, they can reverse the entire process (decryption), putting the data at risk.

Tokenization: Tokenization, also known as pseudonymization, is the process of encoding direct identifiers, like email addresses, into another value (token) and keeping the original mapping of token stored somewhere for relinking in the future. When businesses employ this technique, they leave the indirect identifiers (quasi-identifiers) as they are. Yet, combinations of quasi-identifiers are a proven method to re-identify individuals in a dataset. 

Such a risk emphasizes the importance of understanding the re-identification risk of a dataset when comparing the effects of your organizations’ privacy protection actions. Moreover, this process is often reversed to perform analysis -violating the very principle of the process. The most important question to ask yourself is how do I know my datasets have been anonymized? If you only implement tokenization, the answer is you don’t.


Risk-aware anonymization will unlock the value of your data.

To unlock the value of your datasets in the regulatory era, businesses should implement privacy techniques. And many have! However, as we’ve discussed, the commonly used techniques are insufficient to preserve analytical value and protect your organization. The only way data will be useful to your data scientists is if you transform the data in such a way that the privacy elements enabling re-identification are removed while degrading the data as little as possible.

Consequently, businesses must prioritize risk-aware anonymization in order to optimize the reduction of re-identification risk and protect the value of data.

CN-Protect is the ideal solution to achieve your goals. It utilizes AI and advanced privacy protection methods, like differential privacy and k-anonymization, to assess, quantify and assure privacy and insights are produced in unison.

The process is as follows:

  1. Classify metadata: identify the direct, indirect, and sensitive data in an automated manner, to help businesses understand what kind of data they have.
  2. Quantify risk: calculate the risk of re-identification of individuals and provide a privacy risk score.
  3. Protect data: apply advanced privacy techniques, such as k-anonymization and differential privacy, to tables, text, images, video, and audio. This involves optimizing the tradeoff between privacy protection (removing elements that constitute privacy risk) and analytical value (retaining elements that constitute data fidelity) 
  4. Audit-ready reporting: keep track of what the dataset is, what kind of privacy-protecting transformations were applied, changes in the risk score (before and after privacy actions have been applied), who applied the transformation and at what time, and where the data went. This is the key piece to proving data has been defensibly anonymized to regulatory authorities.

In doing so, businesses are able to establish the privacy-protection of datasets to a standard that fulfills data protection regulations, protects you from privacy risk, and most importantly, preserves the value of the data. In essence, it will unlock data that was previously restricted, and help you achieve improved data-driven outcomes by protecting data in an optimized manner.

By measuring the risk of identification, applying privacy-protection techniques, and providing audit reports throughout the whole process, CN-Protect is the only data privacy solution that will comprehensively unlock the value of your data.

Join our newsletter

The Big Data Era Demands Privacy by Design

The Big Data Era Demands Privacy by Design

Most commodities come with a price. This applies to everything from tangible items like our morning coffee to intangible items such as our online banking transactions. And the commoditization of goods and services in today’s economy is evolving to the point where even personal privacy is associated with a price tag.

Increasingly, enterprises perceive big data and privacy as in competition with one another. Privacy comes with a price, and that price detracts from the profits of big data. However, this is a myth. Organizations don’t have to choose between the two. Privacy can be prioritised, and big data innovations can also prosper. 

How? Through Privacy by Design (PbD). With PbD, companies can meet privacy compliance and legal requirements, while simultaneously creating useful data for analytics.


What is Big Data Analytics and why should it be concerned about privacy?


Big data is extremely powerful, due to its scale and ability to process structured and unstructured data in real time and produce data linkages between unrelated non-identifiable data. What gives big data analytics its name are five core components; the Five V’s of big data. 

Big Data has the ability to process large volumes of a variety of both structured and unstructured data at high velocity (source). Additionally, the data is clean and accurate (veracity), and produces necessary value (source). Once the data is processed, data scientists and analysts are able to identify patterns, infer situations, predict behaviours, and understand trends to drive business decisions (source).

Let’s put into perspective the amount of data that is processed in the modern world. Every single day:

  1. 294 billion emails are sent;
  2. 500 million tweets are shared;
  3. 3.5 billion Google searches are conducted;

Currently, the digital universe contains 4.4 zettabytes of data (source).

The increasing amount of data generated creates concerns for consumers about how and what data is being processed. 88% of respondents in a Deloitte data ethics survey said they would cease their business relationship with an organization who uses their data unethically (source). Consumers are more likely to share their data if they know that the organization is handling their data with privacy in mind (source).

The more data in the hands of organizations paired with the powerful capabilities of big data analytics exposes organizations to unpredictable risks. These risks can include:

  • Data misuse and breach
  • Loss of consumer trust 
  • Fines for privacy related non-compliance
  • Revenue loss


How can organizations sustain big data innovation while considering privacy?

There is no doubt that the scale and diversity of the big data value chain creates challenges for privacy implementation. This is mainly because big data processing contradicts some of the core values of privacy preservation, such as minimization. Data minimization involves processing lean data, and collecting only relevant and necessary data for analysis. Big data powers lie in collecting and storing large volumes of rich information before it is used. There is an obvious tension here (

However, rather than seeing data misuse and breaches as a result of a lack of privacy controls within the organization, organizations can proactively implement privacy controls as a default setting by enforcing the PbD framework.

PbD was conjured in the 1990’s by Dr. Ann Cavoukian, the Information and Privacy Commissioner of Ontario (source). The framework established that privacy should be integrated into all organizational processes from technology to business processes and operations. What is beneficial about PbD is its potential to scale. It’s designed in a way that even big data can easily implement the framework to ensure that data is processed in a way that does not jeopardize personal privacy.


How is this done? The step by step implementation guide:

PbD must be an integral part in the Big Data value chain: 

  1. Data Acquisition/Collection
  2. Data Analysis
  3. Data Curation
  4. Data Storage
  5. Data Usage

The European Union Agency for Network and Information Security (ENISA) created a PbD Engineering Framework that is suited specifically for Big Data. The process involves  implementing eight strategies (minimize, hide, separate, aggregate, inform, control, enforce, and demonstrate) throughout the big data value chain to allow for seamless privacy without sacrificing analytically valuable data (source).

Here are the PbD principles of behaviour that data controllers within organizations should adhere to:


Data Collection, Analysis, and Curation

  1. Minimize: Define what data needs to be collected. Avoid using data that serves no purpose to analytics.
  2. Aggregate: Implement anonymization techniques to remove all personal identifiable Indicators (PII).
  3. Hide: Implement techniques such as encryption, identity making, and secure file sharing, that allows users to control what data is being processed
  4. Inform: Inform all users about what data is collected to allow for transparency.
  5. Control: Implement opt-in measures, and make opt-out tools available throughout the big data processing.

Data Storage

  1. Separate: Keep data separate; this deters central warehouses and allows for computation across different databases by using privacy preserving analytics in distributed systems to protect personal data. 

Data Use

  1. Aggregate: Consider the level of aggregation of metadata to avoid re-identification of individuals as well as to meet legal obligations. Implement privacy-preserving techniques like anonymization to mitigate the risk of potential re-identification.

All Components

  1. Enfronce and Demonstrate: Enforce a privacy policy that also meets legal requirements such as the GDPR, CCPA, and HIPPA, while demonstrating they are in compliance with the policies set forth.

If this framework is applied, enterprises will no longer need to place a price tag on big data or privacy. When PbD is implemented across the organization, both can be achieved.

Join our newletter

What is ‘Privacy by Default’ and why is it the future of data compliance?

What is ‘Privacy by Default’ and why is it the future of data compliance?

If we approach data privacy as a fundamental right of individuals, then it must become a founding principle of innovation and technology. Privacy by Default addresses the increasing awareness of data privacy and ensures that businesses will consider consumer values throughout the product lifecycle.

What is Privacy by Default?

Privacy by Default is a principle of data protection designed to ensure that privacy is baked into the framework of new software, in an effort to provide data subjects with the highest level of protection. Put simply, Privacy by Default is the notion that active consent must be given for data handlers to access a subject’s information. Privacy by Default ensures that businesses are bound to uphold and consider privacy values. The approach holds businesses accountable for their actions and intentions.

To achieve Privacy by Default, data protection must be integrated throughout the product lifecycle, from design to implementation. This means that upon the release of a product or service to the public, the strictest privacy settings must be the default, without requiring action from the end-user. Further, any information required from the user in order to enable a product’s optimal use should only be kept for as long as is necessary to provide the product or service.

For example, when a consumer creates a new social media account and inputs their information, the default setting should be to keep their data private. While there may be places to include their birthday, gender, or location on the platform, users must opt-in, or provide consent, for this information to be shared.


Why is it expected?


While Privacy by Default is not a new idea. Furthermore, it is now a legal requirement under GDPR (the EU General Data Privacy Regulation). In Article 25, GRPR spells out that:

The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility. In particular, such measures shall ensure that by default personal data are not made accessible without the individual’s intervention to an indefinite number of natural persons.

This law reflects the shift in the prescribed importance of privacy in society. Now more than ever, consumers expect businesses to protect their personal information. This was demonstrated in a 2019 study in the Journal of Consumer Policy, which determined that individuals believe that their privacy should be assumed and that is the responsibility of the businesses they interact with to ensure this privacy.

Consumers expect their personal data to be processed carefully, transparently, and only for uses they consented to, by default. Now, under GDPR, businesses have no choice, as hefty regulatory penalties mean non-compliance will cost businesses their reputation, continuity, and profits. 

As a result, Privacy by Default forces businesses to rethink privacy: it is not a burden but a best practice. Now that privacy is considered a fundamental right of users, and a social value, it must become a founding principle of innovation and technology.


What does this mean for your business?


Privacy by Default is the future of data compliance and must become a business priority – not only in Europe, where it is mandated by law, but across the globe. The shift in privacy centricity signals that Privacy by Default will soon be mandated everywhere. Consequently, strategic investments should be made today to get ahead of the curve and demonstrate a privacy-first mindset to consumers. When you value their values, it will make a world of difference.

Under GDPR, organizations should enable users to manage their accounts so that they can define their permissions and determine what information they want to share and make usable to organizations. This means that data minimization is key to providing Privacy by Default, as only necessary personal data should be gathered. Moreover, data should only be stored for as long as is needed to perform the privacy purpose, and deleted or anonymized after that time has passed.

The key here is that data can be maintained so long as it is anonymized and not linked to the consumer in any way. This means that by investing in privacy automation solutions, your business can continue to derive the insights that matter to you, whilst reassuring and appealing to consumers with Privacy by Default. 

The future is private, but that does not mean it is business-depleting. Coupling Privacy by Default with anonymization is the competitive advantage your business needs.

Join our newletter

Why Private Set Intersection (PSI), Differential Privacy (DP) and Secure Multi-Party Computation (SMC) are the future of data privacy

Why Private Set Intersection (PSI), Differential Privacy (DP) and Secure Multi-Party Computation (SMC) are the future of data privacy

Privacy regulations like GDPR and CCPA are changing the way data is collected and used.  Data-driven organizations that use data collaboration to understand their customers and research organizations that rely on data collaboration to advance research are being restricted.  As more privacy regulations come online, what can organizations do to future-proof their use of data, whilst still adhering to privacy regulations?

Technology is now available that will allow organizations to continue to collaborate without ever exposing or moving the underlying data. 

Private Set Intersection (PSI) enables organizations to identify common individuals without revealing anything else. This is key to being able to properly organize data into a geometry that is ultimately consumable by computational algorithms.

Differential Privacy (DP) places mathematical guarantees on privacy in the presence of any amount of side information including knowledge about who is in the intersection. 

Secure Multi-Party Computation (SMC) enables organizations to jointly compute a function while keeping the inputs from being observed.  

All three of these are in fact a perfect combination of mathematical guarantees on how to do useful things with data while preserving privacy and intellectual property.

PSI, DP and SMC in action

Picture this: One data owner has information about cancer rates in the general population and another has information about food purchases over twenty years. A researcher is trying to understand how long-term patterns of food consumption might lead to cancer. To gain this understanding, they need to match food purchases with cancer diagnostics. They need to intersect data in the food purchase panel with the cancer diagnostics to build an attribution analysis. It is a requirement of the numerical algorithm that all the pieces line up appropriately.  PSI allows these data owners to find the commonality between the two data sets without revealing anything about the members that do not overlap. 

At this point, Differential Privacy and Secure Multi-party Computation take over, as we compute the attribution between food and cancer diagnosis. Applying Differential Privacy will create uncertainty around the PSI operations. Even though all parties know with certainty who was included in the original problem formulation, applying Differential Privacy guarantees that the output of any analysis will be uncertain as to who was included in that analysis within certain probabilistic boundaries. 

Finally, the attribution analysis can take place using Secure Multi-party Computation.  Secure Multi-party Computation never moves or exposes the underlying data but yields results that are consistent with co-locating the data. It is a very powerful approach that relies on secret shares that are protected with one-time pad encryption; a technique that cannot be cracked. All the operations in the analysis are computed with Secure Multi-party Computation and require communication between the parties. The result is an attribution analysis that has been properly constructed without compromising data privacy, the IP of each data owner, or data residency requirements.

As regulations continue to evolve and threaten to clamp down on an organization’s ability to generate insights, new technology holds promise for not just organizations but also for consumers that demand privacy protection. Secure Multi-party Computation, Private Set Intersection, and Differential Privacy will make it possible for organizations to continue to generate insights and satisfy future privacy regulations thereby future-proofing their data.

Join our newletter