The Three Greatest Regulatory Threats to Your Data Lakes

The Three Greatest Regulatory Threats to Your Data Lakes

Emerging privacy laws restrict the use of data lakes for analytics. But organizations who invest in privacy automation maintain the use of these valuable business resources for strategic operations and innovation.


Over the past five years, as businesses have increased their dependence on customer insights to make informed business decisions, the amount of data stored and processed in data lakes has risen to unprecedented levels. In parallel, privacy regulations have emerged across the globe. This has limited the functionality of data lakes and turned the analytical process from a corporate asset into a business nightmare.

Under GDPR and CCPA, data is restricted from being used for purposes beyond that which was initially specified — in turn, shutting off the flow of insights from data lakes. As a consequence, most data science and analytics actions fail to meet the standards of privacy regulations. Under GDPA, this can result in fines of up to 4% of a business’s annual global revenue.

However, businesses don’t need to choose between compliance and insights. Instead, a new mindset and approach should be adopted to meet both needs. To continue to thrive in the current regulatory climate, enterprises need to do three things:

  1. Anonymize data to preserve its use for analytics
  2. Manage the privacy governance strategy within the organization
  3. Apply privacy protection at scale to unlock data lakes


Anonymize data to preserve its use for analytics

While the restrictions vary slightly, privacy regulations worldwide establish that customer data should only be used for instances that the subject is aware of and has given permission for. GDPR, for example, determined that if a business intends to use customer data for an additional purpose, then it must first obtain consent from the individual. As a result, all data in data lakes can only be made available for use after processes have been implemented to notify and request permission from every subject for every use case. This is impractical and unreasonable. Not only will it result in a mass of requests for data erasure, but it will slow and limit the benefits of data lakes.

Don’t get us wrong. We think protecting consumer privacy is important. We just think this is the wrong way to go about it.

Instead, businesses should anonymize or pseudonymize the data in their data lakes to take data out of the scope of privacy regulations. This will unlock data lakes and protect privacy, regaining the business advantage of customer insights while protecting individuals. The best of both worlds. 


Manage the privacy governance strategy within the organization

Across an organization, stakeholders operate in isolation, pursuing their own objectives with individualized processes and tools. This has led to fragmentation between legal, risk and compliance, IT security, data science, and business teams. In consequence, a mismatch between values has led to dysfunction between privacy protection and analytics priorities. 

The solution is to implement an enterprise-wide privacy control system that generates quantifiable assessments of the re-identification risk and information loss. This enables businesses to set predetermined risk thresholds and optimize their compliance strategies for minimal information loss. By allowing companies to measure the balance of risk and loss, privacy stakeholder silos can be broken, and a balance can be found that ensures data lakes are privacy-compliant and valuable.


Apply privacy protection at scale to unlock data lakes

Anonymization is not as simple as removing direct personal identifiers such as names. Nor is manual deidentification a viable approach to ensuring privacy compliance in data lakes. In fact, the volume and velocity at which data is accumulated in data lakes make traditional methods of anonymization impossible. What’s more, without a quantifiable risk score, businesses can never be certain that their data is truly anonymized.

But applying blanket solutions like masking and tokenization strips the data of its analytical value. This dilemma is something most businesses struggle with. However, there is no need. Through privacy automation, companies can ensure defensible anonymization is applied at scale. 

Modern privacy automation solutions assess, quantify, and assure privacy protection by measuring the risk of re-identification. Then they apply advanced techniques such as differential privacy to the dataset to optimize for privacy-protection and preservation of analytical value.

The law provides clear guidance about using anonymization to meet privacy compliance, demanding the implementation of organizational and technical controls. Data-driven businesses should de-identify their data lakes by integrating privacy automation solutions into their governance framework and data pipelines. Such action will enable organizations to regain the value of their data lakes and remove the threat of regulatory fines and reputational damage.

Subscribe to our newsletter

What do Trump, Google, and Facebook Have in Common?

What do Trump, Google, and Facebook Have in Common?

This year, the Trump Administration declared the need for a national privacy law to supersede a patchwork of state laws. But, as the year comes to a close, and amidst the impeachment inquiry, time is running out. Meanwhile, Google plans to roll out encrypted web addresses, and Facebook stalls research into social media’s effect on democracy. Do these three seek privacy or power?
The Trump Administration, Google, and Facebook claim that privacy is a priority, and… well… we’re still waiting for the proof. Over the last year, the news has been awash with privacy scandals and data breaches. Every day we hear promises that privacy is a priority and that a national privacy law is coming, but so far, the evidence of action is lacking. This begs the question, are politicians and businesses using the guise of “privacy” to manipulate people? Let’s take a closer look.

Congress and the Trump Administration: National Privacy Law

Earlier this year, Congress and the Trump Administration agreed they wanted a new federal privacy law to protect individuals online. This rare occurrence was even supported and campaigned for by major tech firms (read our blog “What is your data worth” to learn more). However, despite months of talks, “a national privacy law is nowhere in sight [and] [t]he window to pass a law this year is now quickly closing.” (Source)

Disagreement over enforcement and state-level power are said to be holding back progress. Thus, while senators, including Roger Wicker, who chairs the Senate Commerce Committee, insist they are working hard, there are no public results; and with the impeachment inquiry, it is possible we will not see any for some time (Source). This means that the White House will likely miss their self-appointed deadline of January 2020, when the CCPA goes into effect.

Originally, this plan was designed to avoid a patchwork of state-level legislature that can make it challenging for businesses to comply and weaken privacy care. It is not a simple process, and since “Congress has never set an overarching national standard for how most companies gather and use data.”, much work is needed to develop a framework to govern privacy on a national level (Source). However, there is evidence in Europe with GDPR, that a large governing structure can successfully hold organizations accountable to privacy standards. But how much longer will US residents need to wait?

Google Encryption: Privacy or Power

Google has been trying to get an edge above the competition for years by leveraging the mass troves of user data it acquires. Undoubtedly, their work has led to innovation that has redefined the way our world works, but our privacy has paid the price. Like never before, our data has become the new global currency, and Google has had a central part to play in the matter. 

Google has famously made privacy a priority and is currently working to enhance user privacy and security with encrypted web addresses.

Unencrypted web addresses are a major security risk, as they make it simple for malicious persons to intercept web traffic and use fake sites to gather data. However, in denying hackers this ability, power is given to companies like Google, who will be able to collect more user data than ever before. For the risk is “that control of encrypted systems sits with Google and its competitors.” (Source)

This is because encryption cuts out the middle layer of ISPs, and can change the mechanisms through which we access specific web pages. This could enable Google to become the centralized encryption DNS provider (Source).

Thus, while DoH is certainly a privacy and security upgrade, as opposed to the current DNS system, shifting from local middle layers to major browser enterprises centralizes user data, raising anti-competitive and child-protection concerns. Further, it diminishes law enforcement’s ability to blacklist dangerous sites and monitor those who visit them. This also opens new opportunities for hackers by reducing their ability to gather cybersecurity intelligence from malware activity that is an integral part of being able to fulfil government-mandated regulation (Source).

Nonetheless, this feature will roll out in a few weeks as the new default, despite the desire from those with DoH concerns to wait until learning more about the potential fallout.

Facebook and the Disinformation Fact Checkers

Over the last few years, Facebook has developed a terrible reputation as one of the least privacy-centric companies in the world. But it is accurate? After the Cambridge Analytica scandal, followed by endless cases of data privacy ethical debacles, Facebook stalls its “disinformation fact-checkers” on the grounds of privacy problems.

In April of 2018, Mark Zuckerburg announced that the company would develop machine learning to detect and manage misinformation on Facebook (Source). It then promised to share this information with non-profit researchers who would flag disinformation campaigns as part of an academic study on how social media is influencing democracies (Source). 

To ensure that the data being shared could not be traced back to individuals, Facebook applied differential privacy techniques.

However, upon sending this information, researchers complained data did not include enough information about the disinformation campaigns to allow them to derive meaningful results. Some even insisted that Facebook was going against the original agreement (Source). As a result, some of the people funding this initiative are considering backing out.

Initially, Facebook was given a deadline of September 30 to provide the full data sets, or the entire research grants program would be shut down. While they have begun offering more data in response, the full data sets have not been provided.

A spokesperson from Facebook says, “This is one of the largest sets of links ever to be created for academic research on this topic. We are working hard to deliver on additional demographic fields while safeguarding individual people’s privacy.” (Source). 

While Facebook may be limiting academic research on democracies, perhaps they are finally prioritizing privacy. And, at the end of the day with an ethical framework to move forward, through technological advancement and academic research, the impact of social media and democracy is still measurable without compromising privacy.

In the end, it is clear that privacy promises hold the potential to manipulate people into action. While the US government may not have a national privacy law anywhere in sight, the motives behind Google’s encrypted links may be questionable, and Facebook’s sudden prioritization of privacy may cut out democratic research, at least privacy is becoming a hot topic, and that holds promise for a privacy-centric future for the public.

Join our newsletter

How can I Maximally Monetize my Data without Exchanging & Exposing Sensitive Information?

How can I Maximally Monetize my Data without Exchanging & Exposing Sensitive Information?

In 2017, The Economist published a report titled “The world’s most valuable resource is no longer oil, but data.” The report explained how digitally-native organizations like Facebook, Google, and Amazon were leveraging data to achieve high-growth and market disruption.

In the two years since The Economist’s article was published, organizations of all sizes have jumped aboard the digitalization wagon, investing millions of dollars in technology and processes to gather and analyze data. The main goal of going digital is data monetization.

What is data monetization?

Data monetization is the process of extracting insights from data in order to create value for the customer and the business. In a survey by McKinsey & Company, organizations mentioned that they monetize data by adding new services, developing new business models, and joining with similar companies to create a data utility.

In a recent survey, Gartner went deeper in understanding how organizations are monetizing their data, exploring both their present and future approaches:

Data monetization is no longer a nice-to-have strategy. It is a requirement for any enterprise that wants to compete. In the same McKinsey & Company survey, respondents described how incumbents are being disrupted by new entrants who are better at monetizing their data, or by traditional competitors who use data to improve their businesses. 

Unfortunately, data misuse and data breaches have made governments and customers react negatively to the way organizations are using their data. Facebook – whose unethical practices were revealed during the Cambridge Analytica scandal – are the best example of a company with enormous data power that has abused customer trust. In response, governments from all around the world are pushing privacy regulations that control how organizations use customer data. Data is no longer flowing inside and among organizations. Instead, in order to prevent risk, restrictions and siloes are being created. As a result, large digitalization investments are lagging and ROI’s are getting squished.

Data Monetization 2.0

Recent advancements in cryptography and privacy techniques offer organizations new tools for monetizing data – tools that don’t risk exposing or exchanging sensitive information. Three of the most promising new technologies are Differential Privacy, Secure Multi-Party computation, and Fully Homomorphic Encryption.

Differential Privacy

Differential privacy, as The Conversation outline, “makes it possible for tech companies to collect and share aggregate information about user habits while maintaining the privacy of individual users.” 

Differential privacy is a technique that injects “noise” into a data set, with the objective of reducing the risk of re-identification of an individual. With Differential Privacy, the probability of someone being present in a data set is the same as the probability of them not being present in that same data set – thus protecting the privacy of the subjects in the data set. After applying Differential Privacy, organizations can monetize their data with a high degree of certainty that no sensitive information will be exposed or exchange.

Differential privacy has been adopted by large tech companies like Google and Apple as well as by the US Census Bureau. 

Secure Multi-Party Computation (MPC)

As described by Boston University researchers, “Secure Multi-Party Computation allows us to collaboratively analyze data for the public good without revealing private information.” 

MPC is a cryptographic technique that allows various parties to compute a function using everyone’s inputs without disclosing what those inputs are. Since the inputs are never disclosed, the sensitive information is always protected.

MPC gives organizations two major data monetization benefits:

  • Better Insights: Organizations can now expand their internal data with data from partners in a way in which everyone’s sensitive data is never exchanged, but everyone derives more and better insights from the augmented data sets. 
  • Renting Data: Data assets are extremely valuable. However, the value is lost once they are sold. With MPC, organizations can allow partners to benefit from the data without having to sell it, increasing the value of the data asset.

Fully Homomorphic Encryption (FHE)

Fully Homomorphic Encryption (FHE) is a cryptographic technique that allows for computations to be performed on encrypted data. Once data is encrypted, it becomes fully anonymized, thus preventing any sensitive information from being exposed. 

In theory, FHE is the best way to monetize data without exposing or sharing sensitive information. However, in reality, this technique is extremely slow and computationally expensive. This means FHE isn’t a viable option for real commercial applications.

Join our newsletter