Google’s “Project Nightingale” makes a mockery of patients’ right to privacy

Google’s “Project Nightingale” makes a mockery of patients’ right to privacy

Recently, Google announced a business partnership with Ascension, the second-largest healthcare provider in the US. It transpires that, through this partnership, the medical records of 50 million Americans will be transmitted – without the knowledge or consent of the patients. None of the records involved are de-identified. 


Data analytics has become synonymous with business success, and the personal information of real people is often viewed in terms of dollar signs and profit margins. In turn, true privacy is often portrayed as an unattainable ethical ideal. However, patient and consumer privacy should not be disregarded. Consumer consent is important, and if obtaining it is unmanageable, data should at least be de-identified, so that it is no longer personal.

On November 11, Google and Ascension signed an agreement, codenamed Project Nightingale, that constitutes the largest data transfer of its kind. Project Nightingale’s goal is to build a medical-action suggestion tool. However, potential ulterior motives have raised red flags across the globe. By the time the transfer is complete, 50 million patient records will have been shared. Last Tuesday, 10 million had already been delivered.

In the past, similar efforts to use technology to improve healthcare have first required data to be de-identified. A good example is the collaboration between Google and Mayo Clinic. But in the case of Google and Ascension, the lack of de-identification suggests that a new boundary of data greed has been pushed, in an effort to make data available for purposes beyond those associated with Protect Nightingale.

No-one should be able to access and manipulate medical records without the knowledge and consent of patients and doctors. Not only is this highly unethical; it is also potentially illegal.

Coupled with the acquisition of Fitbit earlier this month, Google appears to be on a mission to become a major stakeholder in the healthcare industry. It is unlikely that Google wants to do this for the common good. After all, Google’s actions undermine the basic right to privacy afforded to all individuals. The company’s new ability to combine search and medical records for business gain is troublesome.


Project Nightingale may have violated HIPAA


Since neither patients nor doctors were made aware of Project Nightingale, Google is at risk of a HIPAA violation. In fact, a federal inquiry has already been launched.

Under the law, even healthcare professionals must get permission to access health records. Why wouldn’t big tech?

Google has repeatedly insisted that it will follow all relevant privacy laws. However, with the volume and variety of data that the company holds on the average individual, this case likely pushes into uncharted territory that few regulations currently govern.

Even if the secret harvesting of data is not determined to have breached HIPAA, it has undoubtedly crossed the ethical boundaries of healthcare. 


Google employees can access medical records and use them to make money


Through this partnership, Google plans to create a search tool, designed for medical professionals, that suggests prescriptions, diagnoses, and doctors.

While the public aim may be to improve patient outcomes and reduce spending, a whistleblower expresses concerns that Google “might be able to sell or share the data with third parties, or create patient profiles against which they can advertise healthcare products.”

With the launch of its newest partnership, Google harvested patient names, lab results, hospitalization records, diagnoses, and prescriptions from over 2,600 hospitals. This data can and has been accessed by Google staff (Source). 

With this information,

  1. Google employees can access the medical records of real people. 
  2. Advertisements can target people based on their medical history.
  3. Google can pass identifiable health records to a third-party. 

The potential misuse of medical records places emphasis on the need to de-identify personal information that is being shared, especially without consent. Patients have now unknowingly been put at risk, and their trust has been completely violated. 


Who wants to think that their embarrassing injuries are lunchtime conversations for Google employees? That their cancer is the target of Google ads? That their mental health history is being sold to insurance companies?


As the Google whistleblower puts it, “Patients must have the right to opt-in or out. The uses of the data must be clearly defined for all to see, not just for now but for 10 or 20 years into the future.”

The actions of Google and Ascension cross the boundary of healthcare ethics, signaling a complete disregard for the privacy of patients. When it signed the deal and secretly harvested the medical records of 50 million Americans, Google demonstrated a sense of entitlement and deceitfulness that is entirely unbecoming of a business that already holds an enormous amount of data on the average citizen.

Confidentiality is the foundation of doctor-patient relationships, and if people can no longer trust that their secrets are safe with their healthcare providers, who can they trust?

Join our newletter

Is your GDPR-restricted data toxic or clean?

Is your GDPR-restricted data toxic or clean?

Do your data lakes and warehouses contain personal information? Then you may have data that is toxic in the view of GDPR. If you have not obtained consent for every purpose that you plan to process data for, or haven’t anonymized the personal information, then under GDPR, your business has a significant exposure that could cost hundreds of millions.


When GDPR was implemented in May 2018, few businesses realized the impact it would have on data science and analytics. A year and a half in, the ramifications are indisputable. There have been more than €405 million in fines issued, and brands like British Airways have been irreparably harmed. Today, privacy infractions land on the front page, meaning data lakes pose a monumental threat to the longevity of your business.

The fact is, untold bounds of personal information is being collected, integrated, and stored in data lakes and data warehouses in almost every business. In many cases, this data is being stored for purposes beyond the original for which it was collected. 

In light of the new era of privacy regulations and legal compliance, most of the data sitting in data lakes and warehouses should be considered highly toxic for GDPR compliance.

Toxic data will result in regulatory penalties and a loss of consumer trust

GDPR-determined data controllers must establish a specific legal basis for each and every purpose where personal data is used. If a business intends to use customer data for an additional purpose, then it must first obtain explicit consent from the individual. 

As a result, all data in data lakes can only be made available for use after processes have been implemented to notify and request permission from every subject for every use case. This is impractical and unreasonable. Not only will it result in a mass of requests for data erasure, but it will slow and limit the benefits of data lakes. 

The risk is what we refer to as toxic data. This is identifiable data that you are processing in ways that you have not obtained consent for under GDPR. Left in a toxic state, your data lakes put your business at risk of fines worth 4% of your annual global revenue. 

Worse yet, the European DPA’s have been strict with their enforcement, leading to a flood of GDPR fines and a mass loss of customer confidence for many major data-driven companies. You need to act now before it is too late.

Anonymize your data to remove it from the scope of GDPR

Toxic data exposes your organization to significant business, operational, security, and compliance overheads and risks. Luckily, there is another way to clean your data lakes without undertaking the process of obtaining individual and meaningful consent: anonymize your data.

Rather than scramble to minimize data and update data inventory systems to comply, businesses should invest in automated defensible anonymization systems that can be implemented at an architectural point of control with regard to data lakes and warehouses.

Once data has been anonymized, it is no longer considered personal data. As such, it is no longer regulated by GDPR, and consent is not required to process it.

The impact to your business of using toxic data could be very damaging. If you want to leverage and monetize your data without risking violations and fines, you need to put it outside of the scope of GDPR. To do this, you need to decontaminate your data lakes.

Businesses essentially have two choices: 

(a) maintain the status quo and retain toxic information in data lakes and warehouses, or 

(b) anonymize your data using provable, automated, state-of-the-art solutions, so that GDPR is not applicable.

One option will save your brand reputation and bottom-line. The other is a mass of expensive regulatory complications and litigation exposures.

Join our newletter

Privacy as a commodity deepens inequality

Privacy as a commodity deepens inequality

Privacy is fundamental to societal and consumer values. Consequently, people have demanded privacy regulations to bar businesses from secretly monetizing their sensitive information. Yet, new policy proposals suggest treating data as a commodity will rebalance the relationship between Americans and the technology industry. Implementing legislation of this form perpetuates a future of data colonialism and threatens to disproportionately strip low-income communities of their privacy rights. 

Americans value their privacy and expect businesses to respect it

The last few years have embodied a transformation of data value. Often it is referred to as the new oil, encompassing the proliferation of business insights and the impact of such on business revenue. However, since the beginning of GDPR talks, a wave of concern over the disregard of people’s privacy has occurred. This has lead privacy and insights to be portrayed as polarizing priorities, with businesses and consumers shaping opposite ends of the argument. While such needs not be contrasted amidst the launch of advanced privacy-protecting and insight-preserving technology, people’s fight to be protected signifies a clear prioritization of their privacy.

In support of this, a new privacy bill, dubbed the Consumer Online Privacy Rights Act (COPRA), was proposed by Democratic senators on Tuesday that may be the push needed to implement a federal privacy bill in America.

This is intended to afford US citizens similar rights to their EU counterparts under GDPR. COPRA would:

  • Allow subjects to request what data companies are holding on them and ask for it to be deleted or corrected
  • Require explicit consent for companies to collect and share sensitive data
  • Forbid companies from collecting more information than is reasonable to carry out the service consumers signed up for
  • Necessitate CEOs of data-collection companies will have to annually certify that they have “adequate internal controls” and reporting structures to be compliant 
  • Capacitate private-citizens lawsuits over data collection become a possibility

Sen. Maria Cantwell (D-Wash.) declared that “In the growing online world, consumers deserve two things: privacy rights and a strong law to enforce them.” Steve Durbin, manager director of the Internet Security Forum, seems to agree, writing in an email,  “What is clear is that privacy is becoming more of an issue in the United States.”

This week, a new Pew Research study questioned how these values impact Americans’ view of smart speakers. It demonstrated that more than half of Americans are concerned about data privacy and that 66% of respondents were not willing to sacrifice more data for more personalization.

As smart speakers continue to grow in popularity, data privacy concerns will continue to rise. However, consumers are making decisions as to where to buy based on the privacy stance of the brands. Such is seen in the fact that Google has negative growth in the market share (-40.1%) in light of their GDPR-fine, secret harvesting of medical records, and acquisition of Fitbit. 

Learn more about how consumer purchasing decisions rely on product privacy in our blog:

Treating data as a commodity effectively monetizes your privacy rights 

In mid-November, Democratic candidate Andrew Yang proposed a four-prong policy approach to tackle the inadequacy of American privacy legislature today. Part of this plan is what is referred to as “data as a property right.” The idea is that people should profit from the money companies make collecting and monetizing their personal data. That is to say that, businesses would provide consumers with data payments if they chose to give them access to their personal information.

While the proposal seeks to rebalance the American relationship with big tech, this model will normalize the idea of privacy as a commodity, and disproportionately strip low-income communities of their data privacy.

Ulises Mejias, an associate professor at the State University of New York, explained that “Paying someone for their work is not some magical recipe for eliminating inequality, as two centuries of capitalism have demonstrated.” This argument signals that not only would treating data privacy as a commodity not rebalance the power, but will normalize systemic “data colonialism.”

In the article, “Data Colonialism: Rethinking Big Data’s Relation to the Contemporary Subject,” researchers Couldry and Mejias suggest that continuous tracking “normalizes the exploitation of human beings through data, just as historic colonialism appropriated territory and resources and rules subjects for profit.” Such is based on the unprecedented opportunities for discrimination and behavioural influence that would only be scaled if data goes up for sale.

The reality is, if data is considered a commodity, people would not be selling their data, but their privacy. After all, there are no reasonable statistics to determine the value of data, “[t]here’s no going rate for Facebook likes, no agreed-upon exchange rate for Instagram popularity.” (Malwarebytes Labs) So, the question becomes, not how much am I willing to sell my age information for, but how much do I value the safety afforded with location secrecy and right to non-discrimination based on sexual orientation, for example. 

When data is a commodity, private information that individuals should choose whether or not to disclose becomes transactional. “It’s much easier for a middle-class earner to say no to a privacy invasion than it is for stressed, hungry families, Marlow said.” (Malwarebytes Labs) In essence, treating data as a commodity is like a pay-for-privacy scheme, designed to take advantage of those who need extra money.

When the world is pushing for data privacy to be considered a fundamental human right, moves to monetize privacy reflects the historic appropriation of resources and people. Data colonialism will disadvantage those in low-income communities and regress the revolution of privacy prioritization.

An alternative way to empower autonomy over consumer data is to regulate Privacy by Design and Default. Businesses should embed privacy into the framework of their products and have the strictest privacy settings as the default. In effect, privacy operations management must be a guiding creed from stage one, across IT systems, business practices, and data systems.

This promotes anonymization as a solution and leads to a future where business insights and consumer privacy are part of a common goal. In revoking the commodity nature of the Yang proposal, we rescind the deep-seated inequality ingrained in pay-for-privacy schemes while accomplishing the original intent and building a better future. Privacy is not a commodity, it is a fundamental human right.

Join our newsletter

Consumer purchasing decisions rely on product privacy

Consumer purchasing decisions rely on product privacy

79% of Americans are concerned about the way companies are using their data. Now, they are acting by avoiding products, like Fitbit after the Google acquisition. *Privacy Not Included, a shopping guide from Mozilla, signals that these privacy concerns will impact what (and from whom) consumers shop for over the holidays.

Consumers are concerned about the ways businesses are using their data

A Pew Research Center study investigated the way Americans feel about the state of privacy, and their concerns radiated from the findings. 

    • 60% believe it is not possible to go through daily life without companies and the government collecting their personal data.
    • 79% are concerned about the way companies are using their data.
    • 72% say they gain nothing or very little from company data collected about them.
    • 81% say that the risks of data collection by companies outweigh the benefits.

This study determined that most people feel they have no control over the data that is collected on them and how it is used.

Evidently, consumers lack trust in companies and do not believe that most have their best interests at heart. In the past, this has not been such a big deal, but today, businesses will live and die by their privacy reputation. Such is reflected by the wave of privacy regulations emerging across the world, with GDPR, CCPA, and LGPD.

However, the legal minimum outlined in privacy regulations is not enough for many consumers, suggesting that meeting the basic requirements without embedding privacy into your business model is insufficient.

Such is seen with Fitbit, and the many users pledging to toss their devices in light of the Google acquisition. Google’s reputation has been tarnished in recent months with €50 million GDPR fine and backlash over their secret harvesting of health records in the Ascension partnership.

Google’s acquisition of Fitbit highlights the risks of a failure to prioritize privacy

On November 1, Google acquired Fitbit for $2.1 billion in an effort, we presume, to breach the final frontier of data: health information. Fitbit users are now uprising against the fact that Google will have access not just to their search data, location, and behaviour, but now, their every heartbeat.

In consequence, thousands of people have threatened to discard their Fitbits out of fear and started their search for alternatives, like the Apple Watch. This validates the Pew study and confirms that prioritizing privacy is a competitive advantage.

Despite claims that it will not sell personal information or health data, Fitbit users are doubtful. One user said, “I’m not only afraid of what they can do with the data currently, but what they can do with it once their AI advances in 10 or 20 years”. Another wrote this tweet:


This fear is hinged on the general concern over how big tech uses consumer data, but is escalated by the company’s historical lack of privacy-prioritization. After all, why would Google invest $2.1 billion if they would not profit from the asset? It can only be assumed that Google intends to use this data to break into the healthcare space. This notion is validated by their partnership with Ascension, where they have started secretly harvesting the personal information of 50 million Americans, and the fact that they have started hiring healthcare executives.

Privacy groups are pushing regulators to block the acquisition that was originally planned to close in 2020.

Without Privacy by Design, sales will drop

On November 20, the third annual *Privacy Not Included report was launched by Mozilla, which determines if connected gadgets and toys on the market are trustworthy. This “shopping guide” looks to “arm shoppers with the information they need to choose gifts that protect the privacy of their friends and family. And, spur the tech industry to do more to safeguard customers.” (Source)

This year, 76 products across six categories of gifts (Toys & Games; Smart Home; Entertainment; Wearables; Health & Exercise; and Pets) were evaluated based on their privacy policies, product specifications, and encryption/bug bounty programs.

To receive a badge, products must:

    • Use encryption
    • Have automatic security updates
    • Feature strong password mechanics
    • Manage security vulnerabilities
    • Offer accessible privacy policies

62 of those products met the Minimum Security Requirements, but Ashley Boyd, Mozilla’s Vice President of Advocacy, warns that that is not enough, because “Even though devices are secure, we found they are collecting more and more personal information on users, who often don’t have a whole lot of control over that data.”

8 products, on the other hand, failed to meet the Minimum Security Standards, including:

    • Ring Video Doorbell
    • Ring Indoor Cam
    • Ring Security Cams
    • Wemo Wifi Smart Dimmer
    • Artie 3000 Coding Robot
    • Little Robot 3 Connect
    • OurPets SmartScoop Intelligent Litter Box
    • Petsafe Smart Pet Feeder

These products fail to protect consumer privacy and adequately portray the risks associated with using their products. They are the worst nightmare of consumers, and the very reason 79% are concerned about the way companies are using their data.

Through this study, there was an evident lack of privacy prioritization across businesses, especially small ones, despite positive security measures. And those that did prioritize privacy, tended to make customers pay for it. This signals, that the market is looking for more privacy-focused products, and there is room to move in.

Businesses should embed privacy into the framework of their products and have the strictest privacy settings as the default. In effect, privacy operations management must a guiding creed from stage one, across IT systems, business practices, and data systems. This is what is known as Privacy by Design and Privacy by Default. These principles address the increasing awareness of data privacy and ensure that businesses will consider consumer values throughout the product lifecycle. To learn more read this:

Customers vote with their money, coupling the Pew study results with the Fitbit case, it is clear that customers are privacy-conscious and willing to boycott not only products but companies who do not represent the same values. This week serves as a lesson that businesses must act quickly to bring their products in line with the privacy values, to move beyond basic regulatory requirements, and meet the demands of customers.

Join our newsletter

GDPR, data cemeteries, and million-dollar fines: Deutsche Wohnen SE Case Study

GDPR, data cemeteries, and million-dollar fines: Deutsche Wohnen SE Case Study

On October 30, 2019, Germany dealt out its largest GDPR fine to date: €14.5 million (EUR). The business receiving this fine was Deutsche Wohnen SE, a major property company. This case study will analyze Deutsche Wohnen SE’s legal infractions and the decision-making process of the Berlin Data Protection Authority (DPA), and explain what Deutsche Wohnen SE should have done differently. Through this, we hope to help your business avoid making the same mistake.

Deutsche Wohnen SE was found to have stored the personal data of tenants in an archive system whose architecture was not designed to delete data deemed no longer necessary. This meant that data that was years old could be utilised for purposes other than those specified at the point of collection. This clearly violates GDPR, as the company had no legal grounds to store information that is not relevant to the original business purpose.

A fine of this magnitude signifies that particular importance has been attributed to data graveyards, given the unnecessary risks associated with cyber breaches in this repository. Storing data for excessive periods of time is covered comprehensively by the GDPR, and the damages demonstrate that these articles will be enforced extensively. Regulatory bodies expect businesses to embed privacy into their software design, to minimize data as much as possible, and to implement changes to the way they store and process data.


How Deutsche Wohnen SE violated Articles 5 and 25 of GDPR

Examinations by the Berlin DPA in June 2017 and March 2019 determined that the tenant data stored in Deutsche Wohnen SE’s archive system was not essential to business operations and thus could not legally be stored longer than the necessary period of time. However, there was no system implemented to erase unnecessary data. This violates Article 5 and Article 25 of the GDPR.

Article 5 (e): Personal data shall be “kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed; personal data may be stored for longer periods insofar as the personal data will be processed solely for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) subject to implementation of the appropriate technical and organisational measures required by this Regulation in order to safeguard the rights and freedoms of the data subject (‘storage limitation’)”

Deutsche Wohnen SE’s actions infringed upon the processing principles outlined in Article 5, which determines that data should only be kept for as long as is necessary to complete the original purpose for which it was collected, to benefit the general public, or for scientific/historical research. This means that under the law, tenant data should have been deleted as soon as the tenant ended their connection with the company.

Article 25 (1):Taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing, the controller shall, both at the time of the determination of the means for processing and at the time of the processing itself, implement appropriate technical and organisational measures, such as pseudonymisation, which are designed to implement data-protection principles, such as data minimisation, in an effective manner and to integrate the necessary safeguards into the processing in order to meet the requirements of this Regulation and protect the rights of data subjects.”

Article 25 outlines that privacy should be baked into the framework of data storage systems in an effort to offer data subjects the highest possible level of data protection. This is known as Privacy by Design. Deutsche Wohnen SE failed to meet this criterion because they had no system in place to erase unnecessary data. Since it was determined by the DPA that the tenant information was not vital to operations, a systematic process should have been in place to erase the data as soon as it was no longer pertinent.


Why the Berlin DPA fined Deutsche Wohnen SE 4.5 million euros

Inspectors from Berlin’s DPA first flagged the archive system in an audit in June 2017. Then, in March 2019, more than 1.5 years after the initial examination and nine months after the implementation of GDPR, another audit was performed that demonstrated the system had still not been brought into compliance. 

Consequently, it was determined that Deutsche Wohnen SE deliberately created an archival system that they knew for over a year violated consumer privacy and the law.

The company did initiate a project to attempt to remedy the potential non-compliance, but the measures were determined to be inadequate. Though ineffective, by taking an initial step to remedy the illegal data management structures and by cooperating with the DPA, Deutsche Wohnen SE was able to limit the magnitude of the fine, which could have amounted to as much as 4% of their annual revenue of nearly 1.5 billion euros.

In a press release, the Berlin Commissioner for Data Protection and Freedom of Information, Maja Smoltczyk, said:

Unfortunately, in supervisory practice, we often encounter data cemeteries such as those found at Deutsche Wohnen SE. The explosive nature of such misconduct is unfortunately only made aware to us when it has come to improper access to the mass hoarded data, for example, in cases of cyber-attacks. But even without such serious consequences, we are dealing with a blatant infringement of the principles of data protection, which are intended to protect the data subjects from precisely such risks.

The DPA’s ruling reflects that being unable to prove that data had been disclosed to third parties or accessed unlawfully is irrelevant to the case. If the architecture of data storage was not designed with privacy in mind, it violates GDPR.

This signifies the risk of storing old data in the GDPR era. After all, data cemeteries are just waiting to be mishandled and exposed in data breaches.

GDPR makes provisions for the risks of data breaches, and seeks to limit them by enforcing proactive privacy regulations. These objectives are what the commissioner looked to uphold when he determined the monetary penalty for the German property company. In consequence, Deutsche Wohnen SE was fined 14.5 million euros, the highest German GDPR fine to date, for failing to encompass Privacy by Design. Additional fines were also imposed (between EUR 6,000 and 17,000) for “the inadmissible storage of personal data of tenants in 15 specific individual cases.” (Source)


What Deutsche Wohnen SE should have done, and how you can avoid the same fate

In that same press release, Maja Smoltczyk remarked that it is gratifying to be able to impose sanctions on structural deficiencies under GDPR before data breaches occur. In addition, he gave a warning: “I recommend all organizations processing personal data review their data archiving for compliance with the GDPR.”

The definitive recommendation and high fine signify that the Berlin DPA will meet data cemetery cases with a hard hand. This sets a precedent that the commissioner intends to impose penalties on companies before massive breaches occur, as a means of being proactive. The threat of proactive penalties should incite fear across all data-driven organizations because the impact of audits and finding GDPR non-compliance will undoubtedly disrupt operations and cost money.

However, there is another salient lesson to be learned here: customer information that has been anonymized is no longer considered personal and thus is not regulated by the GDPR. This means that had Deutsche Wohnen SE anonymized their data cemeteries, they would have avoided the €14.5 million regulatory penalties and protected their tenants’ data.

In light of this penalty, it is clear that businesses should implement anonymization strategies into the design of their data repositories. This can be done through privacy automation solutions, like CN-Protect, which assess, quantify, and assure privacy compliance through a four step process:

      1. Metadata classification: identifies the direct, indirect, and sensitive data in an automated manner, to help businesses understand what kind of data they have. 
      2. Protect data: applies advanced privacy techniques, such as k-anonymization and differential privacy, to tables, text, images, video, and audio.
      3. Quantify risk: calculates the risk of re-identification of individuals and provides a privacy risk score.
      4. Automate privacy protection: implements policies to determine how data is treated in your pipelines.

Businesses should use this four step processes to confirm that their dataset has truly been anonymized, and gain certainty that they won’t be next on the GDPR chopping block. In turn, privacy automation will smooth out the compliance process and empower businesses to mitigate any risks from data cemeteries. 

Taking the step to anonymize data minimizes the risk of identification. With de-personalized data, control belongs to you, and GDPR-risks are eliminated. Through this process, privacy protection and data analysis can occur simultaneously. You can be sure that Deutsche Wohnen SE are now wishing they had performed anonymization, as it would have saved them millions of Euros. Don’t get caught in the same position.


Join our newletter

Leveraging GDPR “Legitimate Interests Processing” for Data Science

Leveraging GDPR “Legitimate Interests Processing” for Data Science

The GDPR is not intended to be a compliance overhead for controllers and processors. It is intended to bring higher and consistent standards and processes for the secure treatment of personal data. It’s fundamentally intended to protect the privacy rights of individuals. This cannot be more true than in emerging data science, analytics, AI and ML environments where due to the nature of vast amounts of data sources there is higher risk of identifying the personal and sensitive information of an individual. 

The GDPR requires that personal data be collected for “specified, explicit and legitimate purposes,” and also that a data controller must define a separate legal basis for each and every purpose for which, e.g., customer data is used. If a bank customer took out a bank loan, then the bank can only use the collected account data and transactional data for managing and processing that customer for the purpose of fulfilling its obligations for offering a bank loan. This is colloquially referred to as the “primary purpose” for which the data is collected.  If the bank now wanted to re-use this data for any other purpose incompatible with or beyond the scope of the primary purpose, then this is referred to as a “secondary purpose” and will require a separate legal basis for each and every such secondary purpose. 

For the avoidance of any doubt, if the bank wanted to use that customer’s data for profiling in a data science environment, then under GDPR the bank is required to document a legal basis for each and every separate purpose for which it stores and processes this customer’s data. So, for example, a ‘cross sell and up sell’ is one purpose, while ‘customer segmentation’ is another and separate purpose. If relied upon as the lawful basis, consent must be freely given, specific, informed, and unambiguous, and an additional condition, such as explicit consent, is required when processing special categories of personal data, as described in GDPR Article 9.   Additionally, in this example, the Loan division of the bank cannot share data with its credit card or mortgage divisions without the informed consent of the customer. We should not get confused with a further and separate legal basis the bank has which is processing necessary for compliance with a legal obligation to which the controller is subject (AML, Fraud, Risk, KYC, etc.). 

The challenge arises when selecting a legal basis for secondary purpose processing in a data science environment as this needs to be a separate and specific legal basis for each and every purpose. 

It quickly becomes an impractical exercise for the bank, let alone annoying to its customers, to attempt obtaining consent for each and every single purpose in a data science use case. Evidence shows anyway a very low level of positive consent using this approach. Consent management under GDPR is also tightening up. No more will blackmail clauses or general and ambiguous consent clauses be deemed acceptable. 

GDPR offers controllers a more practical and flexible legal basis for exactly these scenarios and encourages controllers to raise their standards towards protecting the privacy of their customers especially in data science environments. Legitimate interests processing (LIP) is an often misunderstood legal basis under GDPR.  This is in part because reliance on LIP may entail the use of additional technical and organizational controls to mitigate the possible impact or the risk of a given data processing on an individual. Depending on the processing involved, the sensitivity of the data, and the intended purpose, traditional tactical data security solutions such as encryption and hashing methods may not go far enough to mitigate the risk to individuals for the LIP balancing test to come out in favour of the controller’s identified legitimate interest . 

If approached correctly, GDPR LIP can provide a framework with defined technical and organisational controls to support controllers’ use of customer data in data science, analytics, AI and ML applications legally. Without it, controllers may be more exposed to possible non-compliance with GDPR and the risks of legal actions as we are seeing in many high profile privacy-related lawsuits. 

Legitimate Interests Processing is the most flexible lawful basis for secondary purpose processing of customer data, especially in data science use cases. But you cannot assume it will always be the most appropriate. It is likely to be most appropriate where you use an individual’s data in ways they would reasonably expect and which have a minimal privacy impact, or where there is a compelling justification for the processing.

If you choose to rely on GDPR LIP, you are taking on extra responsibility not only for, where needed, implementing technical and organisational controls to support and defend LIP compliance, but also for demonstrating the ethical and proper use of your customer’s data while fully respecting and protecting their privacy rights and interests. This extra responsibility may include implementing enterprise class, fit for purpose systems and processes (not just paper-based processes). Automation based privacy solutions such as CryptoNumerics CN-Protect that offer a systems-based (Privacy by Design) risk assessment and scoring capability that detects the risk of re-identification, integrated privacy protection that still retains the analytical value of the data in data science while protecting the identity and privacy of the data subject are available today as examples of demonstrating technical and organisational controls to support LIP.   

Data controllers need to initially perform the GDPR three-part test to validate using LIP as a valid legal basis. You need to:

  • identify a legitimate interest;
  • show that the processing is necessary to achieve it; and
  • balance it against the individual’s interests, rights and freedoms.

The legitimate interests can be your own interests (controllers) or the interests of third parties (processors). They can include commercial interests (marketing), individual interests (risk assessments) or broader societal benefits. The processing must be necessary. If you can reasonably achieve the same result in another less intrusive way, legitimate interests will not apply. You must balance your interests against the individual’s. If they would not reasonably expect the processing, or if it would cause unjustified harm, their interests are likely to override your legitimate interests.  Conducting such assessments for accountability purposes is happily now also easier than ever, such as with TrustArc’s Legitimate Interests Assessment (LIA) and Balancing Test that identifies the benefits and risks of data processing, which assigns numerical values to both sides of the scale and uses conditional logic and back-end calculations to generate a full report on the use of legitimate interests at the business process level.

What are the benefits of choosing Legitimate Interest Processing?

Because this basis is particularly flexible, it may be applicable in a wide range of different situations such as data science applications. It can also give you more on-going control over your long-term processing than consent, where an individual could withdraw their consent at any time. Although remember that you still have to consider managing marketing opt outs independently of whatever legal basis you’re using to store and process customer data.  

It also promotes a risk-based approach to data compliance as you need to think about the impact of your processing on individuals, which can help you identify risks and take appropriate safeguards. This can also support your obligation to ensure “data protection by design,” performing risk assessments for re-identification and demonstrating privacy controls applied to balance out privacy with the demand for retaining analytical value of the data in data science environments. This in turn would contribute towards demonstrating your PIAs (Privacy Impact Assessments) which forms part of your DPIA (Data Protection Impact Assessment) requirements and obligations.

LIP as a legal basis, if implemented correctly and supported by the correct organisational and technical controls, also provides the platform to support data collaboration and data sharing.  However, you may need to demonstrate that the data has been sufficiently de-identified, including by showing that the risk assessments for re-identification are performed not just on direct identifiers but also on all indirect identifiers as well.  

Using LIP as a legal basis for processing may help you avoid bombarding people with unnecessary and unwelcome consent requests and can help avoid “consent fatigue.” It can also, if done properly, be an effective way of protecting the individual’s interests, especially when combined with clear privacy information and an upfront and continuing right to object to such processing. Lastly, using LIP not only gives you a legal framework to perform data science it also provides a platform that demonstrates the proper and ethical use of customer data, a topic and business objective of most boards of directors. 


About the Authors:

Darren Abernethy is Senior Counsel at TrustArc in San Francisco.  Darren provides product and legal advice for the company’s portfolio of consent, advertising, marketing and consumer-facing technology solutions, and concentrates on CCPA, GDPR, cross-border data transfers, digital ad tech and EMEA data protection matters. 

Ravi Pather of CryptoNumerics has been working for the last 15 years helping large enterprises address various data compliance such as GDPR, PIPEDA, HIPAA, PCI/DSS, Data Residency, Data Privacy and more recently CCPA compliance. I have a good working knowledge of assisting large and global companies, implement Privacy Compliance controls as it particularly relates to more complex secondary purpose processing of customer data in a Data Lakes and Warehouse environments. 

Join our newsletter