Leveraging GDPR “Legitimate Interests Processing” for Data Science

Leveraging GDPR “Legitimate Interests Processing” for Data Science

The GDPR is not intended to be a compliance overhead for controllers and processors. It is intended to bring higher and consistent standards and processes for the secure treatment of personal data. It’s fundamentally intended to protect the privacy rights of individuals. This cannot be more true than in emerging data science, analytics, AI and ML environments where due to the nature of vast amounts of data sources there is higher risk of identifying the personal and sensitive information of an individual. 

The GDPR requires that personal data be collected for “specified, explicit and legitimate purposes,” and also that a data controller must define a separate legal basis for each and every purpose for which, e.g., customer data is used. If a bank customer took out a bank loan, then the bank can only use the collected account data and transactional data for managing and processing that customer for the purpose of fulfilling its obligations for offering a bank loan. This is colloquially referred to as the “primary purpose” for which the data is collected.  If the bank now wanted to re-use this data for any other purpose incompatible with or beyond the scope of the primary purpose, then this is referred to as a “secondary purpose” and will require a separate legal basis for each and every such secondary purpose. 

For the avoidance of any doubt, if the bank wanted to use that customer’s data for profiling in a data science environment, then under GDPR the bank is required to document a legal basis for each and every separate purpose for which it stores and processes this customer’s data. So, for example, a ‘cross sell and up sell’ is one purpose, while ‘customer segmentation’ is another and separate purpose. If relied upon as the lawful basis, consent must be freely given, specific, informed, and unambiguous, and an additional condition, such as explicit consent, is required when processing special categories of personal data, as described in GDPR Article 9.   Additionally, in this example, the Loan division of the bank cannot share data with its credit card or mortgage divisions without the informed consent of the customer. We should not get confused with a further and separate legal basis the bank has which is processing necessary for compliance with a legal obligation to which the controller is subject (AML, Fraud, Risk, KYC, etc.). 

The challenge arises when selecting a legal basis for secondary purpose processing in a data science environment as this needs to be a separate and specific legal basis for each and every purpose. 

It quickly becomes an impractical exercise for the bank, let alone annoying to its customers, to attempt obtaining consent for each and every single purpose in a data science use case. Evidence shows anyway a very low level of positive consent using this approach. Consent management under GDPR is also tightening up. No more will blackmail clauses or general and ambiguous consent clauses be deemed acceptable. 

GDPR offers controllers a more practical and flexible legal basis for exactly these scenarios and encourages controllers to raise their standards towards protecting the privacy of their customers especially in data science environments. Legitimate interests processing (LIP) is an often misunderstood legal basis under GDPR.  This is in part because reliance on LIP may entail the use of additional technical and organizational controls to mitigate the possible impact or the risk of a given data processing on an individual. Depending on the processing involved, the sensitivity of the data, and the intended purpose, traditional tactical data security solutions such as encryption and hashing methods may not go far enough to mitigate the risk to individuals for the LIP balancing test to come out in favour of the controller’s identified legitimate interest . 

If approached correctly, GDPR LIP can provide a framework with defined technical and organisational controls to support controllers’ use of customer data in data science, analytics, AI and ML applications legally. Without it, controllers may be more exposed to possible non-compliance with GDPR and the risks of legal actions as we are seeing in many high profile privacy-related lawsuits. 

Legitimate Interests Processing is the most flexible lawful basis for secondary purpose processing of customer data, especially in data science use cases. But you cannot assume it will always be the most appropriate. It is likely to be most appropriate where you use an individual’s data in ways they would reasonably expect and which have a minimal privacy impact, or where there is a compelling justification for the processing.

If you choose to rely on GDPR LIP, you are taking on extra responsibility not only for, where needed, implementing technical and organisational controls to support and defend LIP compliance, but also for demonstrating the ethical and proper use of your customer’s data while fully respecting and protecting their privacy rights and interests. This extra responsibility may include implementing enterprise class, fit for purpose systems and processes (not just paper-based processes). Automation based privacy solutions such as CryptoNumerics CN-Protect that offer a systems-based (Privacy by Design) risk assessment and scoring capability that detects the risk of re-identification, integrated privacy protection that still retains the analytical value of the data in data science while protecting the identity and privacy of the data subject are available today as examples of demonstrating technical and organisational controls to support LIP.   

Data controllers need to initially perform the GDPR three-part test to validate using LIP as a valid legal basis. You need to:

  • identify a legitimate interest;
  • show that the processing is necessary to achieve it; and
  • balance it against the individual’s interests, rights and freedoms.

The legitimate interests can be your own interests (controllers) or the interests of third parties (processors). They can include commercial interests (marketing), individual interests (risk assessments) or broader societal benefits. The processing must be necessary. If you can reasonably achieve the same result in another less intrusive way, legitimate interests will not apply. You must balance your interests against the individual’s. If they would not reasonably expect the processing, or if it would cause unjustified harm, their interests are likely to override your legitimate interests.  Conducting such assessments for accountability purposes is happily now also easier than ever, such as with TrustArc’s Legitimate Interests Assessment (LIA) and Balancing Test that identifies the benefits and risks of data processing, which assigns numerical values to both sides of the scale and uses conditional logic and back-end calculations to generate a full report on the use of legitimate interests at the business process level.

What are the benefits of choosing Legitimate Interest Processing?

Because this basis is particularly flexible, it may be applicable in a wide range of different situations such as data science applications. It can also give you more on-going control over your long-term processing than consent, where an individual could withdraw their consent at any time. Although remember that you still have to consider managing marketing opt outs independently of whatever legal basis you’re using to store and process customer data.  

It also promotes a risk-based approach to data compliance as you need to think about the impact of your processing on individuals, which can help you identify risks and take appropriate safeguards. This can also support your obligation to ensure “data protection by design,” performing risk assessments for re-identification and demonstrating privacy controls applied to balance out privacy with the demand for retaining analytical value of the data in data science environments. This in turn would contribute towards demonstrating your PIAs (Privacy Impact Assessments) which forms part of your DPIA (Data Protection Impact Assessment) requirements and obligations.

LIP as a legal basis, if implemented correctly and supported by the correct organisational and technical controls, also provides the platform to support data collaboration and data sharing.  However, you may need to demonstrate that the data has been sufficiently de-identified, including by showing that the risk assessments for re-identification are performed not just on direct identifiers but also on all indirect identifiers as well.  

Using LIP as a legal basis for processing may help you avoid bombarding people with unnecessary and unwelcome consent requests and can help avoid “consent fatigue.” It can also, if done properly, be an effective way of protecting the individual’s interests, especially when combined with clear privacy information and an upfront and continuing right to object to such processing. Lastly, using LIP not only gives you a legal framework to perform data science it also provides a platform that demonstrates the proper and ethical use of customer data, a topic and business objective of most boards of directors. 


About the Authors:

Darren Abernethy is Senior Counsel at TrustArc in San Francisco.  Darren provides product and legal advice for the company’s portfolio of consent, advertising, marketing and consumer-facing technology solutions, and concentrates on CCPA, GDPR, cross-border data transfers, digital ad tech and EMEA data protection matters. 

Ravi Pather of CryptoNumerics has been working for the last 15 years helping large enterprises address various data compliance such as GDPR, PIPEDA, HIPAA, PCI/DSS, Data Residency, Data Privacy and more recently CCPA compliance. I have a good working knowledge of assisting large and global companies, implement Privacy Compliance controls as it particularly relates to more complex secondary purpose processing of customer data in a Data Lakes and Warehouse environments. 

Join our newsletter

The Three Greatest Regulatory Threats to Your Data Lakes

The Three Greatest Regulatory Threats to Your Data Lakes

Emerging privacy laws restrict the use of data lakes for analytics. But organizations who invest in privacy automation maintain the use of these valuable business resources for strategic operations and innovation.


Over the past five years, as businesses have increased their dependence on customer insights to make informed business decisions, the amount of data stored and processed in data lakes has risen to unprecedented levels. In parallel, privacy regulations have emerged across the globe. This has limited the functionality of data lakes and turned the analytical process from a corporate asset into a business nightmare.

Under GDPR and CCPA, data is restricted from being used for purposes beyond that which was initially specified — in turn, shutting off the flow of insights from data lakes. As a consequence, most data science and analytics actions fail to meet the standards of privacy regulations. Under GDPA, this can result in fines of up to 4% of a business’s annual global revenue.

However, businesses don’t need to choose between compliance and insights. Instead, a new mindset and approach should be adopted to meet both needs. To continue to thrive in the current regulatory climate, enterprises need to do three things:

  1. Anonymize data to preserve its use for analytics
  2. Manage the privacy governance strategy within the organization
  3. Apply privacy protection at scale to unlock data lakes


Anonymize data to preserve its use for analytics

While the restrictions vary slightly, privacy regulations worldwide establish that customer data should only be used for instances that the subject is aware of and has given permission for. GDPR, for example, determined that if a business intends to use customer data for an additional purpose, then it must first obtain consent from the individual. As a result, all data in data lakes can only be made available for use after processes have been implemented to notify and request permission from every subject for every use case. This is impractical and unreasonable. Not only will it result in a mass of requests for data erasure, but it will slow and limit the benefits of data lakes.

Don’t get us wrong. We think protecting consumer privacy is important. We just think this is the wrong way to go about it.

Instead, businesses should anonymize or pseudonymize the data in their data lakes to take data out of the scope of privacy regulations. This will unlock data lakes and protect privacy, regaining the business advantage of customer insights while protecting individuals. The best of both worlds. 


Manage the privacy governance strategy within the organization

Across an organization, stakeholders operate in isolation, pursuing their own objectives with individualized processes and tools. This has led to fragmentation between legal, risk and compliance, IT security, data science, and business teams. In consequence, a mismatch between values has led to dysfunction between privacy protection and analytics priorities. 

The solution is to implement an enterprise-wide privacy control system that generates quantifiable assessments of the re-identification risk and information loss. This enables businesses to set predetermined risk thresholds and optimize their compliance strategies for minimal information loss. By allowing companies to measure the balance of risk and loss, privacy stakeholder silos can be broken, and a balance can be found that ensures data lakes are privacy-compliant and valuable.


Apply privacy protection at scale to unlock data lakes

Anonymization is not as simple as removing direct personal identifiers such as names. Nor is manual deidentification a viable approach to ensuring privacy compliance in data lakes. In fact, the volume and velocity at which data is accumulated in data lakes make traditional methods of anonymization impossible. What’s more, without a quantifiable risk score, businesses can never be certain that their data is truly anonymized.

But applying blanket solutions like masking and tokenization strips the data of its analytical value. This dilemma is something most businesses struggle with. However, there is no need. Through privacy automation, companies can ensure defensible anonymization is applied at scale. 

Modern privacy automation solutions assess, quantify, and assure privacy protection by measuring the risk of re-identification. Then they apply advanced techniques such as differential privacy to the dataset to optimize for privacy-protection and preservation of analytical value.

The law provides clear guidance about using anonymization to meet privacy compliance, demanding the implementation of organizational and technical controls. Data-driven businesses should de-identify their data lakes by integrating privacy automation solutions into their governance framework and data pipelines. Such action will enable organizations to regain the value of their data lakes and remove the threat of regulatory fines and reputational damage.

Subscribe to our newsletter

The privacy authorities are calling. Is your call centre data GDPR and CCPA compliant?

The privacy authorities are calling. Is your call centre data GDPR and CCPA compliant?

Every time someone calls your call centre, the conversation is recorded and transcribed into free-text data. This provides your business with a wealth of valuable data to derive insights from. The problem is, the way you are using the data today violates privacy regulations and puts you at risk of nine-figure fines and reputational damage.

Call centres often record and manage extremely sensitive data. For example, at a bank, a customer will provide their name, account number, and the answer to a security question (such as their mother’s maiden name). At a wealth management office, someone may call in and talk about their divorce proceedings. This information is not only incredibly personal, but using it for additional purposes without consent is against the law.

Data is transcribed for training purposes. However, the data is often repurposed. Businesses rely on this data for everything from upselling to avoiding customer churn – not to mention the revenue some earn from selling data. 

But under GDPR, data cannot be used for additional purposes without the explicit consent of the data subject.  To comply with privacy regulations, when data science and analytics are performed on the transcripts, a business must first inform and ask permission for each and every instance of use. 

Every time a business asks for permission, they risk requests for data deletion and denials of use that render the transcripts useless. This is because people do not want their data to be exposed, let alone be used to monitor their behaviour.

However, this does not mean all your transcript data is null and void. Why? Because by anonymizing data, you can protect customer privacy and take data out-of-scope from privacy regulations.

In other words, if you anonymize your call centre data, you can use the transcripts for any purpose.

However, anonymization of this kind of data is more complicated than applying traditional methods of privacy protection, like masking and tokenization. Audio transcripts are unstructured, and so using traditional anonymization methods render the data unusable. 

If you use improperly anonymized transcript data for additional purposes, without consent, you will be found in violation of GDPR. This means your business can be fined up to 4% of your revenue. Mistaking partially protected data as anonymized, or hoping manual approaches to de-identification will work, is not legally acceptable. Just ask Google how that turned out for them.

To avoid this, businesses must utilize systematic privacy assessments that quantify the re-identification risk score of their data and establish automated privacy protection based on a predetermined risk threshold. With this, businesses can be certain of the anonymization of their transcripts and perform secondary actions without risking GDPR non-compliance.

State-of-the-art technologies will also enable businesses to measure and reduce the impact of privacy protection on the analytical value of data.

Call centre transcripts are a rich source of customer data that can generate valuable business insights. But blindly using this information can cost your business millions. Use an advanced privacy protection solution to anonymize your transcripts while retaining the analytical value. 

Join our newletter

Is your data toxic or clean? How to prepare for CCPA

Is your data toxic or clean? How to prepare for CCPA

The CCPA is only a few months away from coming into effect. But businesses are not prepared. Currently, petabytes of consumer data rest in businesses’ data science and analytics environments. In many cases, this data is being used for purposes beyond that for which it was initially collected. 

All of this data is governed by the incoming CPPA, which will make it challenging for enterprises to derive consumer-insights and expensive to function. What’s worse, if your business makes a misstep, you will be at risk for class action lawsuits and reputational damage. As a result, most of the data sitting in data lakes and warehouses should be considered highly toxic for CCPA compliance. 


Toxic data will harm your business:

The CCPA defines disclosure obligations and information governance. It will require most companies to overhaul their data systems to improve data discovery and access to information. While taking leaps forward for consumer privacy, the CCPA places a weighty burden on data-driven businesses. Not only does it require them to justify and disclose each and every purpose of consumer data, but it prohibits the use of data for secondary purposes without giving consumers the opportunity to opt-out. 

Under the CCPA, each violation will bring (a) civil penalties of $2,500 if unintentional or (b) $7,500 after notice and a 30-day opportunity to rectify the problem has been provided. In addition, consumer lawsuits can result in statutory damages of up to $750 per consumer per incident. This means that in the CCPA era, a business with 10,000 customers is open to $7,500,000 in lawsuits. This genuine possibility could severely harm the bottom line.

Due to the cost of error, in the CCPA era, personal data, especially that which has been used for additional purposes, should be considered toxic data. This is because it carries significant business, operational, security, and compliance overheads. The good news is there is a way to clean the data and take it out of scope for the CCPA governance. The solution is to defensibly deidentify data.


Cleaning consumer data:

Under CCPA, consumer data used for additional purposes such as data science and analytics that has been correctly deidentified can be considered out of scope for CCPA compliance. To prepare for the CCPA, businesses should prioritize taking data from in-scope to out-of-scope through an automated and defensible deidentification system that can be implemented at an enterprise-level and architectural point of control.

Under the CCPA, defensively deidentified personal data will not be subject to CCPA regulations. This clean data:

  • Is not governed by IT and security controls;
  • Does not need to follow segregation of duties;
  • Is not party to breach notification protocols;
  • Is not required in verifiable consumer requests;
  • Can be used for any purposes without notifying consumers and offering the opportunity to opt-out;
  • Does not give the consumer the option to opt-out.

The implications of using identifiable personal information, or toxic data, will cost businesses millions to maintain every year. When an automated defensible deidentification strategy is just a click away, there is no excuse not to act.

Businesses essentially have two choices: (a) retain toxic data and spend millions ensuring CCPA-compliance, or (b) deidentify their data using privacy automation to take it out-of-scope for CCPA. One option will save your brand and bottom-line, the other is a mass of expensive regulatory complications and litigation exposures.

    Join our newletter

    All organizations need to be moving toward Privacy by Design

    All organizations need to be moving toward Privacy by Design

    Organizations should think about privacy the same way they think about innovation, R&D, and other major organizational processes. Privacy isn’t a one-time compliance check; it’s an integral element to an organization’s functioning. 


    What is Privacy By Design? 

    Privacy by design (PbD) was developed in the 1990’s to complement the increasing need for privacy assurance (see here). PbD is a proactive approach to managing and preventing invasive events by making privacy an organization’s default operating system. This is achieved through privacy operations management, where IT systems, business practices, and networked data systems are built with privacy in mind from step one.


    Why Should Organizations Implement PbD?

    Automatically embedding privacy into your organization’s processes provides many benefits: strengthening customer trust, reducing the likelihood of future breaches, and cost savings.


    Strengthening Customer Trust

    • The seventh foundational principle of PbD emphasizes respect for user privacy. This translates into a privacy system that is completely customer-centric. Communicating to stakeholders about taking privacy seriously; treating personal information with utmost care; and committing to an alliance with the Fair Information Practices (FIP) principles all increases customer trust in an organization. PoB makes it easy to demonstrate and prove how customers’ personal data is automatically safeguarded from privacy and security related threats. This approach signals organizational maturity, allowing for a competitive edge.

    Reducing Future Breaches 

    • Neglecting privacy and categorizing it as a function that should be managed only when new or amended data privacy laws are enforced or when a data breach occurs is detrimental to an organization’s growth and increases risk. There will always be an element of organizational privacy risk, but that risk can be tremendously reduced by implementing a default privacy system. Such a system provides several benefits such as preventing privacy invasions before they happen, and allowing for seamless delivery of data privacy.

    Cost Reduction

    • The average cost of a data breach is $8.9 million USD. That’s a lump sum of funds that could have been allocated to more critical organizational needs, rather than a breach that could have been prevented. PbD can eliminate all unnecessary incident response costs while simultaneously circumventing penalties associated with data privacy law noncompliances. PbD is scalable and applicable to a wide variety of privacy frameworks (FIP, GAPP, APEC) and global privacy laws (GDPR, CCPA). By embedding PbD into an organization’s IT and networked data systems, privacy and compliance teams can rest assured that the risk of data breach is minimized, privacy laws are adhered to, and expenses are reduced.

    PbD is a dire necessity that is critical to the future success of an organization. Understanding this, privacy risk prevention should be a top goal of all organizations and PbD is a proactive way to achieve it.

      Join our newletter