Forget Third-party Datasets – the Future is Data Partnerships that Balance Compliance and Analytical Value

Forget Third-party Datasets – the Future is Data Partnerships that Balance Compliance and Analytical Value

Organizations are constantly gathering information from their customers. However, they are always driven to acquire extra data on top of this. Why? Because more data equals better insights into customers, and better ability to identify potential leads and cross-sell products. Historically, to acquire more data, organizations would purchase third-party datasets. Though these come with unique problems, such as occasionally poor data quality, the benefits used to outweigh the problems. 

But not anymore. Unfortunately for organizations, since the introduction of the EU General Data Protection Regulation (GDPR), buying third-party data has become extremely risky. 

GDPR has changed the way in which data is used and managed, by requiring customer consent in all scenarios other than those in which the intended use falls under a legitimate business interest. Since third-party data is acquired by the aggregator from other sources, in most cases, the aggregators don’t have the required consent from the customers. This puts any third-party data purchaser in a non-compliant situation that could expose them to fines, reputational damage, and additional overhead compliance costs.

If organizations can no longer rely on third-party data, how can they maximize the value of the data they already have? 

By changing their focus. 

The importance of data partnerships and second-party data

Instead of acquiring third-party data, organizations should establish data partnerships and access second-party data. This new approach has two main advantages. One, second-party constitutes the first-party data of another organization, so it is of high quality. Two, there are no concerns about customer consent, as the organization who owns this data has direct consent from the customer. 

That said, to establish a successful data partnership, there are three things that have to be taken into consideration: privacy protection, IP protection, and data analytical value.   

Privacy Protection

Even when customer consent is present, the data that is going to be shared should be privacy-protected in order to comply with GDPR, safeguard customer information, and prevent any risk. Privacy protection should be understood as a reduction in the probability of re-identifying a specific individual in a dataset. GDPR, as well as other privacy regulations, refer to anonymization as the maximum level of privacy protection, wherein an individual can no longer be re-identified. 

Privacy protection can be achieved with different techniques. Common approaches include  differential privacy, encryption, the adding of “noise,” and suppression. Regardless of which privacy technique is applied, it is important to always measure the risk of re-identification of the data.

IP (Intellectual Property) Protection

There are some organizations that are okay with selling their data. However, there are others that are very reticent, because they understand that once the data is sold, all of its value and IP is lost, since they can’t control it anymore. IP control is a big barrier when trying to establish data partnerships. 

Fortunately, there is a way to establish data partnerships and ensure that IP remains protected.

Recent advances in cryptographic techniques have made it possible to collaborate with data partners and extract insights without having to expose the raw data. The first of these techniques is called Secure Multiparty Computation.

As its name implies, with Secure Multiparty Computation, multiple parties can perform computations on their datasets as if they were collocated but without revealing any of the original data to any of the parties. The second technique is Fully Homomorphic Encryption. With this technique, data is encrypted in a way in which computations can be performed without the need for decrypting the data. 

Because the original raw data is never exposed across partners, both of these advanced techniques allow organizations to augment their data, extract insights and protect IP safely and securely.

Analytical Value

The objective of any data partnership is to acquire more insights into customers and prospects. For this reason, any additional data that is acquired needs to add analytical value. But maintaining this value becomes difficult when organizations need to preserve privacy and IP protection. 

Fortunately, there is a solution. Firstly, organizations should identify common individuals in both datasets. This is extremely important, because you want to acquire data that adds value. By using Secure Multiparty Computation, the data can be matched and common individuals identified, without exposing any of the sensitive original data. 

Secondly, organizations must use software that balances privacy and information loss. Without this, the resulting data will be high on privacy protection and extremely low on analytical value, making it useless for extracting insights.

Thanks to the new privacy regulations sweeping the world, acquiring third-party datasets has become extremely risky and costly. Organizations should change their strategy and engage in data partnerships that will provide them with higher quality data. However, for these partnerships to add real value, privacy and IP have to be protected, and data has to maintain its analytical value.

For more about CryptoNumerics’ privacy automation solutions, read our blog here.

Join our newsletter


What is your data worth?

What is your data worth?

How much compensation would you require to give a company complete access to your data? New studies demonstrate that prescribing a price tag to data may be the wrong approach to go about fines for noncompliance. Meanwhile, 51 CEOs write an open letter to Congress to request a federal consumer data privacy law and the Internet Associations joins them in their campaign. At the same time, Facebook is caught using Bluetooth in the background to track users and drive up profits.

Would you want your friends to know every facet of your digital footprint? How about your:

  • Location
  • Visited sites
  • Searched illnesses
  • Devices connected to the internet
  • Content read
  • Religious views
  • Political views
  • Photos
  • Purchasing habits


How about strangers? No? We didn’t think so. Then, the question remains, why are we sharing non-anonymized or improperly-anonymized copies of our personal information with companies? 

Today, many individuals are regularly sharing their data unconsciously with companies who collect it for profit. This data is used to monitor behaviour and profile you for targeted advertising that will make big data and tech companies, like Facebook, $30 per year in revenue per North American user (Source). Due to the profitability of data mining and the increasing number of nine-figure fines for data breaches, researchers have become fascinated by the economics of privacy. 

A 2019 study in the Journal of Consumer Policy questioned how users value their data. In the study, individuals stated they would only be willing to pay $5/month to protect personal data. While the low price tag may sound like privacy is a low priority, it is more likely that individuals’ believe their privacy should be a given, rather than something they have to pay to receive. This theory is corroborated by the fact that in reversing ownership in the question, and asking how much users would accept for full access to their data, there was a median response of $80/month (Source). 

While this study demonstrates a clear value placed on data from the majority, some individuals attributed a much higher cost and others said they would share data for free. Thus, the study concluded that “both willingness to pay and willingness to accept measures are highly unreliable guides to the welfare effects of retaining or giving up data privacy.” (Source)

In calling into question the ability of traditional measures of economic value to determine fines for data breaches and illegally harvesting data, other influential players in the data privacy research were asked how to go about holding corporations accountable to privacy standards. Rebecca Kelly Slaughter, Federal Trade Commission (FTC) Commissioner, stated that “injury to the public can be difficult to quantify in monetary terms in the case of privacy violations.” (Source

Rohit Chopra, a fellow FTC commissioner, also explained that current levels of monetary fines are not a strong deterrent for companies like Facebook, as their business model will remain untouched. As a result, the loss could be recouped through the further monetization of personal data. Consequently, both commissioners suggested that holding Facebook executives personally liable would be a stronger approach (Source).

If no price can equate to the value of personal data, and fines do not deter prolific companies like Facebook, should we continue asking what data is worth? Alessandro Acquisti, of Carnegie Mellon University, suggests an alternative method to look at data privacy is to view it as a human right. This model of thinking poses an interesting line of inquiry for both big data players and lawmakers, especially as federal data privacy legislature increases in popularity in the US (Source).

On September 10, 51 top CEOs, members of Business Roundtable, an industry lobbying organization, sent an open letter to Congress to request a US federal data privacy law that would supersede state-level privacy laws to simplify product design, compliance, and data management. Amongst the CEOs were the executives from Amazon, IBM, Salesforce, Johnson & Johnson, Walmart, and Visa.  

Throughout the letter, the giants accredited the patchwork of privacy regulations on a state-level for the disorder of consumer privacy in the United States. Today, companies face an increasing number of state and jurisdictional legislation that uphold varying standards to which organizations must comply. This, the companies argue, is inefficient to protect citizens, whereas a federal consumer data privacy law would provide reliable and consistent protections for Americans.

The letter also goes so far as to offer a proposed Framework for Consumer Privacy Legislation that the CEOs believe should be the base for future legislation. This framework states that data privacy law should…

  1. Champion Consumer Privacy and Promote Accountability.
  2. Foster Innovation and Competitiveness
  3. Harmonize Regulations
  4. Achieve Global Interoperability

While a unified and consistent method to hold American companies accountable could benefit users, many leading privacy advocates, and even some tech giants, have pointed out the immoral intentions of the CEOs. This is because they regarded the proposal as a method “to aggregate any privacy lawmaking under one roof, where lobby groups can water-down any meaningful user protections that may impact bottom lines.” (Source)

This pattern of a disingenuous push for a federal privacy law continued last week as the Internet Association (IA), a trade group funded by the largest tech companies worldwide, launched a campaign to request the same. Members are largely made up of companies who make a profit through the monetization of consumer data, including Google, Microsoft, Facebook, Amazon, and Uber (Source).

In an Electronic Frontier Foundation (EFF) article, this campaign was referred to as a “disingenuous ploy to undermine real progress on privacy being made around the country at the state level.” (Source) Should this occur, the federal law would supersede state laws, like The Illinois Biometric Information Privacy Act (BIPA) that makes it illegal to collect biometric data without opt-in consent, and the California Consumer Privacy Act (CCPA) which will give state residents the right to access and opt-out of the sale of their personal data (Source). 

In the last quarter alone, the IA has spent close to USD $176,000 to try and weaken CCPA before it takes effect without success. As a result, now, in conjunction with Business Roundtable and Technet, they have called for a “weak national ‘privacy’ law that will preempt stronger state laws.” (Source)

One of the companies campaigning to develop a national standard is Facebook, who is caught up, yet again, in a data privacy scandal.

Apple’s new iOS 13 update looks to rework the smartphone operating system to prioritize privacy for users (Source). Recent “sneak peeks” showed that it will notify users of background activity from third-party apps surveillance infrastructure used to generate profit by profiling individuals outside their app-usage. The culprit highlighted, unsurprisingly, is Facebook, who has been caught using Bluetooth to track nearby users

While this may not seem like a big deal, in “[m]atching Bluetooth (and wif-fi) IDs that share physical location [Facebook could] supplement the social graph it gleans by data-mining user-to-user activity on its platform.” (Source) Through this, Facebook can track not just your location, but the nature of your relationship with others. In pairing Bluetooth-gathered interpersonal interactions with social tracking (likes, followers, posts, messaging), Facebook can escalate its ability to monitor and predict human behaviour.

While you can opt-out of location services on Facebook, this means you cannot use all aspects of the app. For instance, Facebook Dating requires location services to be enabled, a clause that takes away a user’s ability to make a meaningful choice about maintaining their privacy (Source).

In notifying users about apps using their data in the background, iOS 13 looks to bring back a measure of control to the user by making them aware of potential malicious actions or breaches of privacy.

In the wake of this, Facebook’s reaction has tested the bounds of reality. In an attempt to get out of the hot seat, they have rebranded the new iOS notifications as “reminders” (Source) and, according to Forbes, un-ironically informed users “that if they protect their privacy it might have an adverse effect on Facebook’s ability to target ads and monetize user data.” (Source) At the same time, Facebook PR has also written that “We’ll continue to make it easier for you to control how and when you share your location,” as if to take credit for Apple’s new product development (Source).

With such comments, it is clear that in the upcoming months, we will see how much individuals value their privacy and convenience. Between the debate over the value of data, who should govern consumer privacy rights, and another privacy breach by Facebook, the relevance of the data privacy conversation is evident. To stay up to date, sign up for our monthly newsletter and keep an eye out for our weekly blogs on privacy news.

Join our newsletter


Announcing CN-Protect for Data Science

Announcing CN-Protect for Data Science

We are pleased to announce the launch of CN-Protect for Data Science.

CryptoNumerics announces CN-Protect for Data Science, a Python library that applies insight-preserving data privacy protection, enabling data scientists to build better quality models on sensitive data.  

Toronto – April 24, 2019CryptoNumerics, a Toronto-based enterprise software company, announced the launch of CN-Protect for Data Science which enables data scientists to implement state-of-the-art privacy protection, such as differential privacy, directly into their data science stack while maintaining analytical value.

According to a 2017 Keggle study, two of the top 10 challenges that data scientists face at work are data inaccessibility and privacy regulations, such as GDPR, HIPAA, and CCPA.  Additionally, common privacy protection techniques, such as data masking, often decimate the analytical value of the data. CN-Protect for Data Science solves these issues by allowing data scientists to seamlessly privacy-protect data sets that retain their analytical value and can subsequently be used for statistical analysis and machine learning.

“Private information that is contained in data is preventing data scientists from obtaining insights that can help meet business goals. They either cannot access the data at all or receive a low-quality version which has had the private information removed,” says Monica Holboke, co-founder and CEO CryptoNumerics. “With CN-Protect for Data Science, data scientists can incorporate privacy protection in their workflow with ease, and deliver more powerful models to their organization.”

CN-Protect for Data Science is a privacy-protection python library that works with Anaconda, Scikit and Jupyter Notebooks, smoothly integrating into the data scientist workflow.  Data scientists will be able to:

  • Create and apply customized privacy protection schemes, streamlining the compliance process.
  • Preserve analytical value for model building while ensuring privacy protection.
  • Implement differential privacy and other state-of-the-art privacy protection techniques using only a few lines of code.

CN-Protect for Data Science follows the successful launch of CN-Protect Desktop App in March. It is part of CryptoNumerics’ efforts to bring insight-preserving data privacy protection to data science platforms and data engineering pipelines while complying with GDPR, HIPAA, and CCPA. CN-Protect editions for SAS, R Studio, Amazon AWS, Microsoft Azure, and Google GCP are coming soon.  

Join our newsletter



Announcing CN-Protect Free Downloadable Software for Privacy-Protection

Announcing CN-Protect Free Downloadable Software for Privacy-Protection

We are pleased to announce the launch of CN-Protect as free, downloadable software to create privacy-protected data sets. We believe:

  • Protecting consumer privacy is paramount.
  • Satisfying privacy regulations such as HIPAA, GDPR, and CCPA should not sacrifice analytical value.
  • Data scientists, privacy officers, and legal teams should have the ability to easily ensure privacy.

Today’s businesses are faced with data breaches or misuse of consumer information on a regular basis. In response, governments have moved to protect their citizens through regulations like GDPR in Europe and CCPA in California. Organizations are scrambling to comply with these regulations without adversely impacting their business. However, there is no doubt that people’s privacy should not be compromised.

Current approaches to de-identify data such as masking, tokenization, and aggregation can leave data unprotected or without analytical value.

  • Data masking has no analytical use once applied to all values and, if not applied to all values, does not protect against re-identification. Data masking works by replacing existing sensitive information with information that looks real, but is of no use to anyone who might misuse it and is not reversible.
  • Tokenization removes all data utility of the tokenized fields, but re-identification is still possible through untokenized fields. Tokenization replaces sensitive information with a non-sensitive equivalent or a token which can be used to map back to the original data, but without access to the tokenization system, it is impossible to reverse.
  • Aggregation severely reduces the analytical value and if not done correctly can lead to re-identification. Data aggregation summarizes the data in a cumulative fashion such that any one individual is not re-identifiable. However, if the data does not contain enough samples, re-identification is still possible.

CN-Protect leverages AI and the most advanced anonymization techniques, such as optimal k-Anonymity and Differential Privacy to protect your data and maintain analytical value. Furthermore, CN-Protect is easy to adopt, as a downloadable application or plug-in for your favourite data science platform.

With CN-Protect you can:

  • Comply with privacy regulations such as HIPAA, GDPR, and CCPA;
  • Create privacy protected datasets while maintaining analytical value.

There are a variety of privacy models and data quality metrics available that you can choose from depending on your desired application. These privacy models use anonymization techniques to protect private information, while data quality metrics are used to balance those techniques against the analytical value of the data.

The following privacy models are available in CN-Protect:

  • Optimal k-Anonymity;
  • t-Closeness;
  • Differential Privacy;
  • and more.

You will be able to:

  • Specify parameters for the various privacy models that can be applied across your organization and fine-tuned for your many applications;
  • Define acceptable levels of privacy risk for your organization and the intended use of your data;
  • Get quantifiable metrics that you can use for compliance;
  • Understand the impact of privacy protection on your statistical and machine learning models.

Stay ahead of regulations and protect your data. Download CN-Protect now for a free trial!

Join our newsletter