The top 4 privacy solutions destroy data value and fail to meet regulatory standards.

The top 4 privacy solutions destroy data value and fail to meet regulatory standards.

Businesses are becoming increasingly reliant on data to make decisions and learn about the market. Yet, due to an increase in regulations, the information they have collected is becoming less and less useful. While people have been quick to blame privacy laws, in reality, the biggest impediment to analytics and data science are insufficient data privacy solutions.

From our market research, the top four things people are doing are (1) access controls, (2) masking, (3) encryption, and (4) tokenization. While these solutions are a step in the right direction, they wipe the data of its value and leave businesses open to regulatory penalties and reputational damage.

Your data privacy solutions are insufficient

Access controls: Access controls limit who can access data. While important, they are just not an effective privacy-preserving strategy because the controls do not protect the identity of the individuals or prevent their data from being used for purposes they have not consented to. It is a an all-or-nothing approach, whereby someone has access to the data, and privacy is not protected, or not, in which case, no insights can be gleaned at all.

Masking: This is a process by which sensitive information is replaced with synthetic data. In doing so, the analytical value is wiped. While this solution works for testing, it is not an advantageous solution if you are planning to provide the data to data scientists. After all, you are sending them this data to unlock valuable insights!

Encryption: Encryption is a security mechanism that protects data until it is used. At which point, the data is decrypted, exposing the private data to the user. Additionally, the concern with encryption, is that if someone accesses the key, they can reverse the entire process (decryption), putting the data at risk.

Tokenization: Tokenization, also known as pseudonymization, is the process of encoding direct identifiers, like email addresses, into another value (token) and keeping the original mapping of token stored somewhere for relinking in the future. When businesses employ this technique, they leave the indirect identifiers (quasi-identifiers) as they are. Yet, combinations of quasi-identifiers are a proven method to re-identify individuals in a dataset. 

Such a risk emphasizes the importance of understanding the re-identification risk of a dataset when comparing the effects of your organizations’ privacy protection actions. Moreover, this process is often reversed to perform analysis -violating the very principle of the process. The most important question to ask yourself is how do I know my datasets have been anonymized? If you only implement tokenization, the answer is you don’t.

 

Risk-aware anonymization will unlock the value of your data.

To unlock the value of your datasets in the regulatory era, businesses should implement privacy techniques. And many have! However, as we’ve discussed, the commonly used techniques are insufficient to preserve analytical value and protect your organization. The only way data will be useful to your data scientists is if you transform the data in such a way that the privacy elements enabling re-identification are removed while degrading the data as little as possible.

Consequently, businesses must prioritize risk-aware anonymization in order to optimize the reduction of re-identification risk and protect the value of data.

CN-Protect is the ideal solution to achieve your goals. It utilizes AI and advanced privacy protection methods, like differential privacy and k-anonymization, to assess, quantify and assure privacy and insights are produced in unison.

The process is as follows:

  1. Classify metadata: identify the direct, indirect, and sensitive data in an automated manner, to help businesses understand what kind of data they have.
  2. Quantify risk: calculate the risk of re-identification of individuals and provide a privacy risk score.
  3. Protect data: apply advanced privacy techniques, such as k-anonymization and differential privacy, to tables, text, images, video, and audio. This involves optimizing the tradeoff between privacy protection (removing elements that constitute privacy risk) and analytical value (retaining elements that constitute data fidelity) 
  4. Audit-ready reporting: keep track of what the dataset is, what kind of privacy-protecting transformations were applied, changes in the risk score (before and after privacy actions have been applied), who applied the transformation and at what time, and where the data went. This is the key piece to proving data has been defensibly anonymized to regulatory authorities.

In doing so, businesses are able to establish the privacy-protection of datasets to a standard that fulfills data protection regulations, protects you from privacy risk, and most importantly, preserves the value of the data. In essence, it will unlock data that was previously restricted, and help you achieve improved data-driven outcomes by protecting data in an optimized manner.

By measuring the risk of identification, applying privacy-protection techniques, and providing audit reports throughout the whole process, CN-Protect is the only data privacy solution that will comprehensively unlock the value of your data.

Join our newsletter


Big data privacy regulations can only be met with privacy automation

Big data privacy regulations can only be met with privacy automation

GDPR demands that businesses obtain explicit consent from data subjects before collecting or using data. CCPA affords consumers the right to request that their data is deleted if they don’t like how a business is using it. PIPEDA requires consumers to provide meaningful consent before their information is collected, used, and disclosed. New privacy laws are coming to India (PDPB), Brazil (LGPD), and over 100 other countries. In the US alone, over 25 state privacy laws have been proposed, with a national one in the works. Big data privacy laws are expansive, restrictive, and they are emerging worldwide faster than you can say, “what about analytics?”.

Such has made it challenging for businesses to (1) keep up, (2) get compliant, and (3) continue performing analytics. Not only are these regulations inhibitive, but a failure to meet the standards will result in astronomical fines — like British Airway’s 204.6 M euros. As such, much distress and confusion has ensued in the big data community.

 

Businesses are struggling to adapt to the rapid increase in privacy regulations

Stakeholders cannot agree whose responsibility it is to ensure compliance, they are struggling with consent management, and they are under the interpretation that removing direct identifiers renders data anonymous.

Major misconceptions can cost businesses hundreds of millions. So let’s break them down.

  1. “Consent management is the only way to keep performing analytics.”

While consent is essential at the point of collection, the odds are that, down the road, businesses will want to repurpose data. Obtaining permission in these cases, due to the sheer volume of data repositories, is an unruly and unmanageable process. A better approach is to anonymize the data. Once this has occurred, data is no longer personal, and it goes from consumer information to business IP.

2. “I removed the direct identifiers, so my data is anonymized”

If this were the case, anonymization would be an easy process. Sadly, it is not so. In fact, it has been widely acknowledged that simply redacting directly identifying information, like names, is nowhere near sufficient. In almost all cases, this leaves most of the dataset re-identifiable.

3. “Synthetic data is the best way to manage emerging regulations.”

False! Synthetic data is a great alternative for testing, but when it comes to achieving insights, it is not the way to go. Since this process attempts to replicate trends, important outlier information can be missed. As a result, the data is unlikely to mirror real-world consumer information, compromising the decision-making process.

What’s evident from our conversations with data-driven organizations is that businesses need a better solution. Consent management is slowing them down, legacy approaches to anonymization are ineffective, and current workarounds skew insights or wipe data value.

 

Privacy automation: A better approach to big data privacy laws

The only manageable and effective solution to big data privacy regulations is privacy automation. This process measures the risk of re-identification, applies privacy-protection techniques, and provides audit reports throughout the anonymization process. It is embedded in an organization’s data pipeline, spreading the solution enterprise-wide and harmonizing the needs of stakeholders by optimizing for anonymization and preservation of data value.

This solution will simplify the compliance process by enabling privacy rules definition, risk assessments, application of privacy actions, and compliance reporting to happen within a single application. In turn, privacy automation allows companies to unlock data in a manner that protects and adds value to consumers.

Privacy automation is the best method for businesses to handle emerging laws and regain the mission-critical insights they have come to rely on. Through this approach, privacy unlocks insights.

Join our newsletter


How data partnerships unlock valuable second-party data

How data partnerships unlock valuable second-party data

Sharing data is fundamental to advancing business strategy, especially for marketing. Over the last five years, analytics has become an essential role of the marketer, who has grown used to purchasing datasets on the open markets to create a more holistic understanding of customers.

However, amongst the new wave of privacy regulations and demand for transparency, achieving the same level of understanding has become a challenge. It has also increased the risk of using third-party data, because businesses cannot trust that the outside sources have met compliance regulations or provided accurate data. Consequently, more companies are turning to second-party data sources.

Second-party data is essentially someone else’s first-party data. It is data purchased directly from another company, assuring trust and high quality. In essence, these strategic partnerships enable businesses to build their customer databases and gain insights quickly.

Purchasing another business’s data will increase the breadth of data lakes, but it also opens organizations up to regulatory fines and reputational damage. Harnessing their data as-is requires you to trust a data partner’s processes. Comparably, relying on your own (first-party) data can guarantee privacy, but it lacks the breadth associated with other types of data. 

Second-party data is an important addition for businesses looking to piece together the puzzle of each individual’s data records. However, compliance, security, and a loss of control are problems that must be addressed. There are two options: anonymization and privacy-protected data collaboration.

 

Anonymize and share: a data partnership plan

Under regulations like GDPR, data cannot be used for secondary purposes without first obtaining the consent of the data subject. This means that if you are looking to share data and establish a data partnership, you must first obtain meaningful consent from every party – a fruitless process!

To avoid this expense while still respecting the principle behind the law, businesses can rely on anonymization to securely protect consumers and regain control over the information. Once data has been anonymized, it is no longer considered personal. This means businesses can perform data exchanges and achieve desired marketing efforts.

However, in sharing data with another business, you lose control over what happens to it. A better solution is privacy protected data collaboration.

Privacy-protected data collaboration builds data partnerships securely

By leveraging an advanced tool, like CN-Insight – that uses secure multi-party computation (SMC) and private set intersection (PSI) – businesses can acquire insights without sharing the data itself.

While this may seem odd, the reality is, you don’t want the data, you want the insights – and you can still get those without exposing your data.

That, in essence, is what CN-Insight enables, thanks to sophisticated cryptographic techniques that have been researched and developed at reputable institutions like Bristol University and Aarhus University. To learn more about how it works, check out our product page

Through privacy-protected data collaboration, your data is never exposed, but you receive the insights you need. This is the best solution for marketers looking to regain the holistic understanding they had of customers before regulatory authorities began imposing high fines and strict laws. Not only will businesses avoid the expensive and time-consuming contract process associated with traditional second-party data sharing, but they can trust that their data was never shared and they are not at risk of regulatory penalties. 

Data partnerships through virtual data collaboration are the solution to unlock second-party data in a privacy-preserving way.

Join our newsletter


De-identify your data, or be in violation of CCPA

De-identify your data, or be in violation of CCPA

On January 1, 2020, California implemented a landmark law that is reshaping data analytics and data science worldwide. This is the day the CCPA became effective, and businesses’ consumer data became a significant legal and financial risk to the company. 

While the tech industry has tried to restrict the legislation since its birth, its lobbying efforts have fallen short. In one month, business as usual will result in class-action lawsuits. At this time, Californians will enjoy a new set of privacy rights and regain ownership over their own information.

 

CCPA transforms data from a commodity to a privilege that can be revoked

Under the CCPA, Californians will be able to demand access to the data that companies collect on them, and how they have used it. Not only does this put the onus on businesses to manage verifiable consumer requests, but also to ensure that all collection and uses of data are in the best interest of people.

However, the CCPA is much more extensive than that. Businesses not only have to give customers access to the data they have on them but have to inform them and provide the opportunity to opt-out or request deletion when they want to leverage that data.

Through de-identification, businesses unlock consumer insights

Under the CCPA, if you want to use data beyond the original purpose for which it was collected, you have two choices:

  1. Inform consumers of every data use and risk deletion requests, or
  2. De-identify the data.

The first option is impractical. The second is possible, but not through traditional methods of privacy protection. Our research demonstrates that at least 60% of datasets that are thought to be de-identified are not de-identified. The result is that most organizations will be unknowingly violating the CCPA.

Every time data is used in a way that violates the CCPA, businesses risk $7500 in civil penalties and $750 in statutory damages per consumer.

To avoid this, businesses need a guarantee that their data has been de-identified. This is something only an automated risk assessment tool can provide. Yet, “according to Ethyca, more than 70% of companies have not built any sort of engineering solution for policy compliance.” (Source)

 

Traditional anonymization strategies will not satisfy CCPA.

Traditional approaches to anonymization are unreliable, ineffective, and often wipe the analytical value of the data. Legacy approaches, like masking, were never intended to ensure privacy. Rather, these were cybersecurity techniques evolved in a time when organizations did not rely on the insights derived from consumer data. 

Manual approaches, where risk and compliance teams restrict access to data lakes and warehouses, impede business goals. Worse, they are cumbersome, involving significant and impractical overheads. The volume and velocity at which data is accumulated in data lakes make traditional methods of anonymization impractical. 

It is only possible to truly anonymize data to a CCPA-compliant level and retain the analytical value of the data by using a solution that optimizes for a reduced privacy risk score and minimal information loss. 

Consequently, to continue deriving insights in the CCPA-era, enterprises need to invest in optimized anonymization now. Combining advanced privacy-preserving techniques with privacy risk scoring will allow for a balance between privacy compliance and business insight goals.

By handling indirect or quasi-identifier information carefully – and using advanced privacy-protecting techniques like k-anonymity and differential privacy – enterprises can have the best of both worlds. Compliance and data science success.

However, this privacy stance cannot be achieved manually. It requires a dedicated, automated, specialist privacy platform. 

 

Avoiding the de-identification illusion

To ensure this de-identification process is defensible, businesses must understand, to a high degree of accuracy, the proportion of records that would be correctly identified in a given dataset by an attacker. This is what is known as a privacy risk score, and is based on the principle of Marketer Risk. The methodology is approached from the perspective of someone who wished to re-identify as many records as possible in a disclosed dataset. 

From this point of view, businesses are able to gain an accurate understanding of how privacy actions affect their dataset, and continue to adjust their techniques until an acceptable risk threshold is met (Learn more: https://cryptonumerics.com/privacy-risk-score).
If businesses invest in privacy risk scoring and advanced protection solutions, they can ensure privacy compliance is automatically enforced throughout their data pipeline. Effective anonymization leaves data monetizable and provides a necessary degree of certainty for leadership that analytics will not harm your business. Anonymization is the only viable solution for data-driven companies to meet CCPA-regulations without harming their business model.

Join our newsletter


2020 and the future of data science and data privacy

2020 and the future of data science and data privacy

Recently, data science and data-driven businesses have been marred by scandal. From the Cambridge Analytica election affair, to Google’s secretive move into the healthcare space, people are angry and regulatory authorities are showing teeth. 

However, we believe 2020 will have good things in store for the industry. Namely, we suspect there will be a focus on making data actionable for data science and embedding privacy into innovation.

 

Data scientists are limited by privacy concerns, but they don’t have to be

A lack of data access is a core problem that data scientists face on a regular basis. When their job is to find actionable insights, traditional approaches to handling privacy makes it challenging to get right. For example, masking and encryption wipe the analytical value of the data, and rob scientists of the material necessary for completing their job. This has pitted compliance and data teams against one another, while leaving both teams unfulfilled. After all these approaches fail to meet the values of either team: they wipe value and don’t ensure personal data is protected.

Yet, both of these objectives are essential to innovation and business growth. Organizations require actionable data and consumer protection. If approached correctly, privacy protection is the method to unlock data. We know, this sounds like an oxymoron. But truthfully, preserving privacy the right way will give your data scientists increased and improved data.

 

2020 will be the year of risk-aware anonymization

In order to achieve innovation goals, businesses must rethink the way they handle privacy. Organizations cannot rely on traditional methods like access controls, masking, encryption, and tokenization in order to achieve anonymized data. These legacy processes were intended for security, not privacy, and they appeared at a time when data wasn’t valued by organizations in the same way as it is today. 

In the new era of anonymization legislation, none of the legacy approaches to privacy compliance are fit for purpose. 

The best solution on the market today is risk-aware anonymization: A technique that combines the most advanced privacy approaches – differential privacy, k-anonymity – with AI to optimize for risk reduction and value preservation at scale. By using this tool, analysts and scientists will be able to unlock their data lakes and warehouses while respecting consumer privacy. In essence, the process of stripping personal information will transform consumer data to business IP.

 

Anonymization makes it possible to embed privacy into innovation

Once businesses have invested in risk-aware anonymization technology, they will be able to reach a new bound of success. Not only will their scientists have full access to data, but consumer privacy will also be ensured. 

We predict this is the wave to come next year, in which all business stakeholders will achieve their priorities and boost performance. We believe this is the solution to better organizations through and through, and that it is only by establishing privacy in the business model that innovation can occur.

 

Privacy is the foundation of progress. 2020 is the year businesses will garner the benefits.

Join our newsletter