All organizations need to be moving toward Privacy by Design

All organizations need to be moving toward Privacy by Design

Organizations should think about privacy the same way they think about innovation, R&D, and other major organizational processes. Privacy isn’t a one-time compliance check; it’s an integral element to an organization’s functioning. 


What is Privacy By Design? 

Privacy by design (PbD) was developed in the 1990’s to complement the increasing need for privacy assurance (see here). PbD is a proactive approach to managing and preventing invasive events by making privacy an organization’s default operating system. This is achieved through privacy operations management, where IT systems, business practices, and networked data systems are built with privacy in mind from step one.


Why Should Organizations Implement PbD?

Automatically embedding privacy into your organization’s processes provides many benefits: strengthening customer trust, reducing the likelihood of future breaches, and cost savings.


Strengthening Customer Trust

  • The seventh foundational principle of PbD emphasizes respect for user privacy. This translates into a privacy system that is completely customer-centric. Communicating to stakeholders about taking privacy seriously; treating personal information with utmost care; and committing to an alliance with the Fair Information Practices (FIP) principles all increases customer trust in an organization. PoB makes it easy to demonstrate and prove how customers’ personal data is automatically safeguarded from privacy and security related threats. This approach signals organizational maturity, allowing for a competitive edge.

Reducing Future Breaches 

  • Neglecting privacy and categorizing it as a function that should be managed only when new or amended data privacy laws are enforced or when a data breach occurs is detrimental to an organization’s growth and increases risk. There will always be an element of organizational privacy risk, but that risk can be tremendously reduced by implementing a default privacy system. Such a system provides several benefits such as preventing privacy invasions before they happen, and allowing for seamless delivery of data privacy.

Cost Reduction

  • The average cost of a data breach is $8.9 million USD. That’s a lump sum of funds that could have been allocated to more critical organizational needs, rather than a breach that could have been prevented. PbD can eliminate all unnecessary incident response costs while simultaneously circumventing penalties associated with data privacy law noncompliances. PbD is scalable and applicable to a wide variety of privacy frameworks (FIP, GAPP, APEC) and global privacy laws (GDPR, CCPA). By embedding PbD into an organization’s IT and networked data systems, privacy and compliance teams can rest assured that the risk of data breach is minimized, privacy laws are adhered to, and expenses are reduced.

PbD is a dire necessity that is critical to the future success of an organization. Understanding this, privacy risk prevention should be a top goal of all organizations and PbD is a proactive way to achieve it.

    Join our newletter

    Why privacy automation is the only route to CCPA de-identification compliance

    Why privacy automation is the only route to CCPA de-identification compliance

    The volume and variety of big data is surpassing the functionality of traditional privacy management. With the California Consumer Privacy Act (CCPA) coming into effect on January 1, 2020, it is more critical than ever for every organization operating in California to make real changes in how they manage their data. The only viable solution is privacy automation.

    Traditional data privacy management approaches are slow, unscalable, and imperfect

    Across organizations, data drives results. Yet the velocity at which data is growing threatens to turn this “new oil” from a profit-driver to fine-magnifier. 

    Organizations are continuously collecting data in massive volumes, while data consumers utilize that information to perform their day to day jobs. This ceaseless cycle of data acquisition and analysis makes it almost impossible for organizations to monitor and manage all their data.

    Yet today, data privacy management is often performed manually, with a survey-based approach. These processes do not scale. Not only are they unreliable, but manual implementation slows down data analysis and has made it impossible to stay current with privacy regulations. On top of this, first-generation techniques such as encryption, masking and hashing no longer cut it. In consequence, privacy and compliance teams are seen to be preventing companies from unlocking their most valuable resource. 

    In reality, compliance is impossible with manual human review. It would be like cutting your lawn with a pair of scissors. 

    Privacy compliance requires a unified effort from the various departments and privacy-related stakeholders within an organization. This requires the right tools and processes.

    Now, with the CCPA coming into effect on January 1, 2020, organizations are being put to the test. For the first time, enterprises with operations in California will be held accountable to strict privacy regulations. There is an urgent need to build a manageable and effective data privacy strategy.

    Under the CCPA, personal data cannot be used for secondary purposes unless explicit notice and the opportunity to opt-out has been provided from each user. These secondary purposes, like data science and monetization, are what makes data so valuable – why risk opt-outs?

    If data has been de-identified or aggregated, it is no longer restricted. However, the standards for data classification as “de-identified or aggregated” are extremely high, and traditional methods of anonymization, like tokenization and hashing, will not cut it. It is only when advanced privacy techniques (differential privacy, k-anonymization) are applied correctly that data science and monetization can continue.

    As a result, the complex structures of the average organization require a single enterprise-wide, end-to-end, automated solution to meet data and privacy compliance regulations: Privacy Automation.


    Privacy automation: the only tool that can ensure CCPA compliance

    Privacy automation assesses, quantifies and assures privacy by measuring the risk of identification, applying privacy-protection techniques, and providing audit reports throughout the whole process. With AI and a combination of the most advanced privacy techniques, this solution will simplify the compliance process and allow for privacy rules definition, risk assessments, application of privacy actions, and compliance reporting to happen within a single application. This process is part of what is known as Privacy by Design and Privacy by Default.

    With Privacy Automation, metadata classification becomes possible. This lets you generate an automated and easy-to-understand privacy risk score.

    Automation extends enterprise-wide, harmonizing the needs of Risk and Compliance and data science teams, and ensuring regulations are abided. This allows companies to unlock data in a manner that protects and adds value to consumers in a safer method than manual privacy-protection.

    With privacy automation, enterprises can leverage state-of-the-art solutions to innovate without limitation or fear. In consequence, it is the only tool that will realistically enable enterprises to become CCPA-compliant by January 2020.

    For more information, read our blog, The Business Incentives to Automate Privacy Compliance Under CCPA.

    Join our newsletter

    Toxic Data is contaminating data lakes and data warehouses. How can you clean it up before it’s too late?

    Toxic Data is contaminating data lakes and data warehouses. How can you clean it up before it’s too late?

    Data is the new oil. Understandably, over the past few years, organizations have been gathering larger and larger quantities of it. However, a reckoning is on the way. New regulations such as CCPA mean that most of this data carries an inherent risk, that could affect and disrupt organizations if not dealt with.

    Toxic data lakes and data warehouses

    Websites, apps, social media – they all form part of how organizations use the digital space to gather consumer information, and then use that information to generate better solutions and services. All of this information is being stored in data lakes and data warehouses. 

    The big problem with storing all of this data is that the majority of it is personal information. And under the new privacy regulations, personal information has to be handled with special care. Mismanagement of this information opens the door to fines that could go up to nine digits, as well as to the loss of customer trust and revenue. 

    In light of the new era of privacy regulations, most of the data sitting in data lakes and data warehouses is highly toxic.

    Unfortunately, organizations are having a hard time measuring their privacy exposure and adopting processes and technologies to control and reduce risk. The toxicity of data lakes and warehouses keeps going up and is a ticking bomb waiting to explode.   


    Decontaminating before it is too late

    Data governance has been the traditional way in which organizations have tried to control the risk exposure of their data assets. However, traditional data governance needs to evolve to cover the rise of privacy risk. 

    Modern-day data governance must contain the following elements to be able to clean the data lakes and warehouses:

    • Provide a comprehensive privacy risk measure: Reducing privacy risk without being able to measure the risk is like flying a plane without instruments. Organizations need to be able to measure their privacy risk exposure as well as understand how each data consumer impacts this risk.


    • Privacy enhanced data discovery and classification: In order to measure and reduce privacy risk, organizations need to know what data they have. This discovery and classification need to incorporate privacy terminology to be effective in measuring privacy risk.


    • Variety of privacy-preserving techniques: Reducing privacy risk requires an understanding of how the data’s analytical value gets degraded. Utilising a variety of privacy techniques, like differential privacy and k-anonymity, allows organizations to reduce privacy risk while preserving analytical value.


    • Automatic policy enforcement: Making sure that the data that is coming in and out of the data lakes and warehouses is a huge endeavour that can’t be done manually. Organizations need systems that support and automate policy enforcement.


    • Data governance reports: Knowing exactly who accessed what data is a must for any data governance process.


    Cleaning your data lake and warehouse from toxic data is possible as long as you implement data governance tools that are suited for understanding and managing the privacy risk inherent in your data assets. 

    Subscribe to our newsletter

    How Safely Opening Data Silos Facilitates Cutting-edge Data Science

    How Safely Opening Data Silos Facilitates Cutting-edge Data Science

    In their day-to-day work, data scientists face a range of challenges. (We’ve covered the big ten challenges on the blog before.) One of the biggest of all? Siloed data; data inaccessible to anyone other than the owner.

    The problem with data silos

    Typically, the problem of data silos presents itself like this. 

    A data scientist notices that their model is not performing as well as they hoped. The data scientist has a hypothesis as to why this might be the case, and wants to test their hypothesis. The data currently available does not adequately measure the potentially explanatory variable. The data scientist begins a long expedition, searching for a dataset to test the hypothesis. The hunt is slow and arduous, including many red herrings and wild geese. 

    At long last, the data is found, and lo and behold, it’s been inside the organization this entire time! 

    The scientist sends off an email requesting access, and heads home, content the search is over. They arrive at work the next day to an email from the data owner rebuffing their request. Back to square one, foiled by siloed data. 

    While this problem may be solved with a simple email between managers, the cost is already apparent. Time was spent seeking out data that was internally available. More time passes waiting for clearance of the data. Even the time spent hypothesizing about model performance likely could have been reduced had the data been accessible from the outset. 

    Further, these problems don’t always get solved. Sometimes siloed data is never found. Sometimes it’s never cleared. In these cases, the data scientist is unable to test their hypothesis. At best, siloed data inhibits productivity. At worst, it limits fundamental understanding of the problem by obfuscating relationships between data.

    Why do data silos appear?

    Siloed data can crop up within an organization for a wide variety of reasons, ranging from the malicious (teams wanting to maintain a competitive advantage) to the innocuous (too many layers of hierarchy/bureaucracy to traverse). As data privacy concerns and a more nuanced understanding of identifying information emerge, limiting access to sensitive data is an increasingly pressing motivation for the creation of data silos. 

    Unfortunately, limiting data access also limits data utility. Luckily, there are a couple techniques available to gain data utility, while maintaining acceptable privacy standards.

    How to break open data silos

    One technique is to anonymize siloed data. The goal of anonymization is to limit the risk of any individuals in the dataset being identified. Simple anonymization, such as removal of direct identifiers like name and ID, have long been commonplace. However, these approaches are insufficient. Indirect identifiers remain, leaving the data susceptible to inference attacks.

    Luckily, there are more effective ways to anonymize data. By utilizing concepts such as k-anonymity and t-closeness, data owners can possess a clear understanding of their data’s risk of reidentification. Applying advanced practical privacy-preserving protection to indirect identifiers to reach a desired reidentification risk is one way to open data silos.

    Another solution is to implement Secure Multi-Party Computation (SMC). SMC enables a number of parties to jointly compute a function over a set of inputs that they wish to keep private (head here for a deeper explanation). This allows training a machine learning model across datasets held by multiple parties as if they were a single dataset, but without actually moving, centralizing, or disclosing the data between the parties. This approach increases data utility without actually opening the silo.

    Data privacy concerns are likely to only increase moving forward. Because of this, data silos are likely to continue to be created. Being able to safely open or connect these silos will be key to unlocking the analytical value of the data within.

    For more about CryptoNumerics’ privacy automation solutions, read our blog here.

    Join our newletter