Select Page
Facial recognition, data marketplaces and AI changing the future of data privacy

Facial recognition, data marketplaces and AI changing the future of data privacy

With the emerging Artificial Intelligence (AI) market comes the everso popular privacy discourse. Data regulations that are being introduced left and right, while effective, are not yet representative of the growing technologies like facial recognition or data marketplaces. 

Companies like Clearview AI are once again making headlines after receiving cease-and-desist from big tech, despite there being no current facial recognition laws they are violating. As well, Nature released an article calling for an international code of conduct for genomic research aggregation. Between both AI and healthcare, Microsoft has announced a $40million AI for health initiative.  

Facial recognition company hit with cease-and-desist  

A few weeks ago, we released a blog introducing the facial recognition start-up, Clearview AI, as a threat to privacy.

Since then, Clearview AI has continued to make headlines, and most recently, has received cease-and-desist from Big Tech companies like Google, Facebook and Twitter. 

To recap, Clearview AI is a facial recognition company that has created a database of over 3 billion searchable faces, scrapped from different social media platforms. The company has introduced its software in more than 600 police departments across Canada and the US. 

The company’s CEO, Hoan Ton-That, has repeatedly defended its company, telling CBS

“Google can pull in information from all different websites, so if it’s public, you know, and it’s out there, it could be inside Google search engine it can be inside ours as well.”

Google then responded, saying this was ‘inaccurate.’ Google says they are a public search option and give sites choices in what they put out, as well as give opportunities to withdraw images. All options Clearview does not provide, as they go as far as holding images in their database after it’s been deleted from its source.

While Google and Facebook have both provided Clearview with a cease-and-desist, Clearview has maintained that they are within their first amendment rights to use the information. One privacy attorney told Cnet, “I don’t really buy it. It’s really frightening if we get into a world where someone can say, ‘The first amendment allows me to violate everyone’s privacy.’” 

While cities like San Francisco have started banning facial recognition, there are currently no federal laws addressing it as an issue, thus allowing more leeway for companies like Clearview AI to create potentially dangerous software.  

Opening up genomic data for researchers across the world

With these introductions to new health care initiatives, privacy becomes more relevant than ever. Healthcare data contains some of the most sensitive information for an individual. Thus the idea of big tech buying and selling such personal data is scary.

Last week, Nature, an international journal of science, released that over 800 terabytes of genomic data are available to investigators all over the world. The eight authors worked explicitly to protect the privacy of the thousands of patients/volunteers who consented to have their data used in this research.

The article reports the six-year collection of 2,658 cancer genomes between 468 institutions in 34 different countries is creating an open market of genome data. This project, called the Pan-Cancer Analysis of Whole Genomes (PCAWG), was the first attempt to aggregate a variety of subprojects and release a dataset globally.

A significant emphasis of this article was on the lack of clarity within the healthcare research community on how to protect data in compliance with the ongoing changes to privacy legislation.

Some issues in these genomic marketplaces are in the strategic attempts to not only comply with the variety of privacy legislation but also in ensuring that no individual can be re-identified using this information. Protecting patient data is not just a legislative issue but a moral one. 

The majority of the privacy unclarity came from questions of what vetting should occur before gaining access to information, or what checks should be made before the data is internationally shared.

As the article says, “Genomic researches urgently need clear data-sharing rules that are harmonized across jurisdictions.” The report calls for an international code of conduct to overcome the current hurdles that come with the different emerging privacy regulations. 

The article also said that the Biobanking and BioMolecular Resources Research Infrastructure (BBMRI-ERIC), had announced back in 2017 that it would develop an EU Code of Conduct on Health-Related Data. Once completed and approved, 

Microsoft to add another installment to AI for Good

The ability to collect patient data and share in an open market for researchers or doctors is helping cure and diagnose patients at a faster rate than ever before seen. In addition to this, AI is seen as another vital tool for the growing healthcare industry.

Last week, Microsoft announced its fifth installment to its ‘AI for Good’ project, ‘AI for Health.’ This project, similar to its cohorts, will support healthcare initiatives such as providing access to cash grants, AI tools, cloud computing, and Microsoft researchers. 

The project will focus on three different AI strategies, including: 

  • Accelerating medical research
  • Increase the understanding of mortality to guard various global health crises.
  • Reducing health injustices 

The program will be emphasizing supporting individual non-profits and under-served communities. As well, Microsoft released in a video their focus on addressing Sudden Infant Death Syndrome, eliminating Leprosy and diabetic retinopathy-driven blindness in partnership with different non-for-profits. 

AI is essential to healthcare, and it has lots of data that companies like Microsoft are utilizing. But with this, privacy has to remain at the forefront of the action. 

Similar to Nature’s data, protecting user information is extremely important and complicated when looking to utilize the data’s analytical value, all while complying with privacy regulations. Microsoft announced that it would be using Differential Privacy as its privacy solution. 

Like Microsoft, we at CryptoNumerics user differential privacy as a method of anonymization and data value preserving. Learn more about differential privacy and CryptoNumeric solutions.


Join our newsletter

Transactional data: A privacy nightmare and what to do about it

Transactional data: A privacy nightmare and what to do about it

Our everyday actions, like buying a morning coffee or taking the train, all create a digital trail of our lives. We as humans tend to fall into individual habits, taking the same routes every day, eating at the same restaurants on certain nights. We create a unique fingerprint through our routine actions. These ‘fingerprints’ make it very easy to predict our next moves.

And with the rapidly growing machine learning technologies, companies are able to predict our next moves. 

Here is where transactional data comes in. This data relates to transactions of an organization and includes information that is captured, for example, when a product is sold/purchased. This data is collected from a wide variety of industries, spanning from financial services to transportation, and retail, to name a few.

These collections of information paint a picture of your entire life, online and offline.


How Transactional data is everywhere, in everything. 


Transactional data provides a constant flow of information and is necessary for maintaining a company’s competitive edge, deepening client insight, and customer experience. 

Each purchase, click, and online movement is held under the umbrella of transactional data. This demographic data, dealing with transactional records, time, or location, all provide access to our real-life behaviors and movements.

Thanks to transactional data, companies can provide customers with a personalized experience. This can be a good thing. For example, banks are a significant participant in growing client profiles using transactional data. Each purchase we do with our bank cards establishes spending patterns. Having AI detect and learn from our purchase habits can help in fraud detection or credit card theft. 

However, transactional data constitutes a colossal privacy exposure that is exceptionally difficult to control. For example, perhaps you are someone who Uber’s to work and home. If this action happens only once, it does not represent a significant risk; however,  doing every day creates a pattern that can depict several aspects of who you are, where you live, or where you hang out. 

Because of this, if a company puts efforts into removing a personal identifier such as a name, it would appear compliant to safeguarding user data. However, these patterns of information can group to re-identify a person without using personal identifiers. An attacker could discover a place of work, a stop for coffee, and a house address without having to know the person’s name.  

These extensive collections of our information are not protected to the extent they should be. If companies know and are using such detailed information, how is it not protected to the point of no risk? 


Protecting Transactional data


Transactional data will keep growing as IoT becomes more prevalent. As mentioned before, reducing the privacy risk of a dataset that contains transactional data is challenging. It is not just about applying different privacy protection techniques but also understanding how each row relates to each other because the most crucial aspect is to preserve the analytical value.  

At CryptoNumerics, we have developed a way to solve this problem. By leveraging CN-Protect and our technical expertise, we are helping telematics companies, as well as companies in the finance sector, reduce the risk of re-identification in their transactional datasets. 


Join our newsletter

Masking is killing data science

Masking is killing data science

When it comes to data science, the trade-off for protecting data while keeping its value appears near impossible. And with the introduction of privacy legislation like the California Consumer Privacy Act (CCPA), this trade-off makes the job even harder.

Methods such as data masking appear the standard option, with privacy risks landing at almost 0%. However, with information loss reaching a potential of over 50%, the opportunity for data analytics vanishes.

Data Masking is a lost battle

Data Masking is a de-identification technique that focuses on the redaction or transformation of information within a dataset to prevent exposure. The information in the resulting is of low quality. This technique is not enough to move a company forward in innovation.

Companies need to privacy protect their consumer data. However, they also need to preserve the value of the data for analytical uses.

Masking fails to address how data works today and how a business benefits for it. Consumer data is beneficial to all aspects of an organization and creates a better experience for the customer. Failing to utilize and protect the datasets leaves your company behind in innovation and consumer satisfaction.

Privacy-protection that preserves analytical value

Data scientists need to be able to control the trade-off, and the only way to do it is by using “smart” optimization solutions.

A “smart” optimization solution is one that can modify the data in different ways using privacy risk and analytical value as its optimization functions. With a solution like this, a data scientist would get a data set that is optimized for analytics, and that is privacy compliant, the best of both worlds.

Smart Optimization vs Masking

Let’s look at the impact that both privacy-protection solutions have on a machine learning algorithm.

For this example, we want to predict loan default risk using a random forest model. The model is going to be run on three datasets:

  • In the clear: The original dataset without any privacy transformations.
  • Masked dataset: Transformation of the original dataset using standard rule-based masking techniques.
  • Optimized dataset: Transformation of the original dataset using a smart optimization solution.


The dataset has 11 variables:

  • Age
  • Sex
  • Job
  • Housing
  • Saving Account Balance
  • Checking Account Balance
  • Credit Account Balance
  • Duration
  • Purpose
  • Zipcode
  • Risk

Let’s compare the results.

Running the model with the original dataset gave us an accuracy of 93%; however, the risk of re-identification is 100%. When we used the masked data, the model accuracy dropped to 28%, since there were 5 risk levels, the accuracy of this model is barely better than random. On the positive side, the risk of re-identification is 0%. Lastly, the accuracy with the optimized dataset was 87%, a drop of only 5 points vs the original data. Additionally, the risk of re-identification was only 3%.

While having a 0% privacy risk is appealing, the loss in accuracy makes masking worthless for analytic purposes.

This example highlights why masking is killing data science, and organizations need to implement smart optimization solutions, like CryptoNumeric’s CN-Protect, that reduce the risk of-reidentification while preserving the analytical value of the data.

Gaining a competitive edge in your industry means utilizing consumer data. And by adequately protecting the data without mass data loss, a high data value can take your company far.



Join our newsletter