Select Page
Facebook collecting healthcare data

Facebook collecting healthcare data

As many of our previous blogs have highlighted, COVID-19 is severely impacting the tech world. Privacy regulations have been a hot topic for debate between governments, Big tech, and its users. 

 Facebook has joined the top companies taking advantage of user data in COVID-19 research. As well, Brazil’ LGPD sees pushback in its enforcing because of COVID-19. Opposite to Brazil, US senators are introducing a new privacy bill to ensure American’s data privacy remains protected.

 

Facebook collecting symptom data

 In the current pandemic climate, tech companies of all sizes have stepped up to provide solutions and aid to governments and citizens struggling to cope with COVID-19. As we’ve highlighted in our previous blog posts, Google and Apple have been at the frontlines of introducing systems to protect their user privacy, while inflicting change in how communities track the virus.

 Following closely behind, Facebook has introduced its attempt to work with user data for the greater good of COVID-19 research. 

 Facebook announced its partnerships with different American universities to begin collecting symptom data in different countries. Facebooks CEO and founder told the Verge that the information could work to highlight COVID hotspots across the globe, especially in places where governments have neglected to address the virus’s severity.

Facebook has been working throughout this pandemic to demonstrated how aggregated & anonymized data can be used for good. 

However, not everyone is taking to Facebook’s sudden praise for user data control. One article highlighted how the company is still being investigated by the FTC over privacy issues

Facebook’s long list of privacy invasions for its users is raising some concerns over not how the data is currently being used, but how it will be handled after the pandemic has subsided. 

 

Brazil pushes back privacy legislation.

At the beginning of this year, we wrote an article outlining Brazil’s first data protection act, LGPD. This privacy legislation follows closely to that of the EU’s GDPR and will unify 40 current privacy laws the country has. 

Before COVID-19s effect on countries like Brazil, many tech companies were already pressuring the Brazilian government to change LGPD’s effective date.

On April 29th, the Brazilian president delayed the applicability date of the LGPD to May 3rd, 2021. By issuing this Provisional measure, the Brazilian Congress has been given 16 days to approve the new LGPD implementation. 

If Congress does not approve of this new date by May 15th, the Brazillian Congress must vote on the new LGPD date. If they do not, the LGPD will come into effect on August 14th, 2020. 

Brazil’s senate has now voted to move its introduction in January 2021, with sanctions coming to action in August 2021. Meaning all lawsuits and complaints can be proposed as of January 1st, and all action will be taken on August 1st (source).

 

America introduces new privacy law.

Much like Brazil’s privacy legislation being affected by COVID-19, some US senators have stepped up to ensure the privacy of American citizens data.

The few senators proposing this bill have said they are working to “hold businesses accountable to consumers if they use personal data to fight the COVID-19 pandemic.”

This bill does not target contact tracing apps like those proposed by Apple and Google. However, it does ensure that these companies are effectively using data and protecting it. 

The bill requires companies to gain consent from users in order to collect any health or location data. As well, it forces companies to ensure that the information they collect is properly anonymized and cannot be re-identified. The bill requires that these tech companies will have to delete all identifiable information once COVID-19 has subsided, and tracking apps are no longer necessary. 

The bill has wide acceptance across the congressional floor and will be enforced by the state attorney generals. This privacy bill is being considered a big win for Americans’ privacy rights, especially with past privacy trust issues between big tech companies and its users. 

Location data and your privacy

Location data and your privacy

As technology grows to surround the entirety of our lives, it comes as no surprise that each and every move is tracked and stored by the very apps we trust with our information. With the current COVID-19 pandemic, the consequences of inviting these big techs into our every movement are being revealed. 

At this point, most of the technology-users understand the information they do give to companies, such as their birthdays, access to pictures, or other sensitive information. However, some may be unknowing of the amount of location data that companies collect and how that affects their data privacy. 

Location data volume expected to grow

We have created over 90% of the world’s data since 2017. As wearable technology continues to grow in trend, the amount of data a person creates each day is on a steady incline. 

One study reported that by 2025, the installation of worldwide IoT-enabled devices is expected to hit 75 billion. This astronomical number highlights how intertwined technology is into our lives, but also how welcoming we are to that technology; technology that people may be unaware of the ways their data is collected. 

Marketers, companies and advertisers will increasingly look to using location-based information as its volume grows. A recent study found that more than 84% of marketers use location data for their 

The last few years have seen a boost in big tech companies giving their users more control over how their data is used. One example is in 2019 when Apple introduced pop-ups to remind users when apps are using their location data.

Location data is saved and stored for the benefit of companies to easily direct personalized ads and products to your viewing. Understanding what your devices collect from you, and how to eliminate data sharing on your devices is crucial as we move forward in the technological age. 

Click here to read our past article on location data in the form of wearable devices. 

COVID-19 threatens location privacy

Risking the privacy of thousands of people or saving thousands of lives seems to be the question throughout this pandemic; a question that is running out of time for debate. Companies across the big 100 have stepped up to volunteer its anonymized data, including SAS, Google and Apple. 

One of the largest concerns is not how this data is being used in this pandemic, but how it could be abused in the future. 

One Forbes article brought up a comparison of the regret many are faced with after sharing DNA with sites like 23andMe, leading to health insurance issues or run-ins with criminal activity. 

As companies like Google, Apple and Facebook step-up to the COVID-19 technology race, many are expressing their concerns as these companies have not been deemed reliable for user data anonymization. 

In addition to the data-collecting concern, governments and big tech companies are looking into contact-tracking applications. Civilian location data being used for surveillance purposes, while alluded for the greater good of health and safety, raises multiple red flags into how our phones can be used to survey our every movement. To read more about this involvement in contact tracing apps, read our latest article

Each company has released that it anonymizes its collected data. However, in this pandemic age, anonymized information can still be exploited, especially at the hands of government intervention. 

With all this said, big tech holds power over our information and are playing a vital role in the COVID-19 response. Paying close attention to how user data is managed post-pandemic will be valuable in exposing how these companies handle user information.

 

4 techniques for data science

4 techniques for data science

With growing tension between privacy and analytics, the job of data scientists and data architects has become more complicated. The responsibility of data professionals is not just to maximize the value of the data, but to find ways in which data can be privacy protected while preserving its analytical value.

The reality today is that regulations like GDPR and CCPA have disrupted the way in which data flows through organizations. Now data is being siloed and protected using techniques that are not suited for the data-driven enterprise. Data professionals are left with long processes to access the information they need and, in many cases, the data they receive has no analytical value after it has been protected. 

This emphasizes the importance of using adequate privacy protection tactics to ensure that personally identifiable information (PII) is accessible in a privacy-protected manner and that it can be used for analytics.

To satisfy GDPR and CCPA, organizations can choose between three options, pseudonymization, anonymization, and consent: 

Pseudonymization is replacing direct identifiers, like names or emails, with pseudonyms to protect the privacy of the individual. However, this process is still in the scope of the privacy regulations, and the risk for re-identification remains very high.

Anonymization, on the other hand, looks at direct identifiers and quasi-identifiers and transforms the data in a way that’s now out-of-scope for privacy regulations and can be used for analytics. 

Consent requires organizations to ask customers for their consent on the usage of data, this opens up the opportunity for opt-outs. If the usage of the data changes, as it often does in an analytics environment, then consent may very well be required each time.

There are four main techniques that can help data professionals with privacy protection. All of them have different impacts on both privacy protection and data quality. These are: 

Masking: A de-identification technique that focuses on the redaction or transformation of information within a dataset to prevent exposure. 

K-anonymity: This privacy model ensures that each individual is indistinguishable from at least k-1 other individuals based on their attributes in a dataset.

Differential Privacy: Is a technique applied to an algorithm that mathematically guarantees that the output of the algorithm doesn’t change whether an individual is in the dataset or not. It is achieved through the addition of noise to the algorithm. 

Secure Multi-Party Computation: This is a cryptographic technique where a group of parties can compute a function over their inputs while keeping their inputs private.

Keep your eyes peeled in the next few weeks for our whitepaper, which will explore these four techniques in further detail.

Key terms to know to navigate data privacy

Key terms to know to navigate data privacy

As the data privacy discourse continues to grow, it’s crucial that the terms used to explain data science, data privacy and data protection are accessible to everyone. That’s why we at CryptoNumerics have compiled a continuously growing Privacy Glossary, to help people learn and better understand what’s happening to their data. 

Below are 25 terms surrounding privacy legislations, personal data, and other privacy or data science terminology to help you better understand what our company does, what other privacy companies do, and what is being done for your data.

Privacy regulations

    • General Data Protection Regulation (GDPR) is a privacy regulation implemented in May 2018 that has inspired more regulations worldwide. The law determined data controllers must establish a specific legal basis for each and every purpose where personal data is used. If a business intends to use customer data for an additional purpose, then it must first obtain explicit consent from the individual. As a result, all data in data lakes can only be made available for use after processes have been implemented to notify and request permission from every subject for every use case.
    • California Consumer Privacy Act (CCPA) is a sweeping piece of legislation that is aimed at protecting the personal information of California residents. It will give consumers the right to learn about the personal information that businesses collect, sell, or disclose about them, and prevent the sale or disclosure of their personal information. It includes the Right to Know, Right of Access, Right to Portability, Right to Deletion, Right to be Informed, Right to Opt-Out, and Non-Discrimination Based on Exercise of Rights. This means that if consumers do not like the way businesses are using their data, they request for it to be deleted -a risk for business insights 
    • Health Insurance Portability and Accountability Act (HIPAA) is a health protection regulation passed in 1998 by President Clinton. This act gives patients the right to privacy and covers 18 personal identifiers that are required to be de-identified. This Act is applicable not only in hospitals but in places of work, schooling, etc.

Legislative Definitions of Personal Information

  • Personal Data (GDPR): Any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person’ (source)
  • Personal Information (PI) (CCPA): “information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.” (source)
  • Personal Health Information (PHI) (HIPAA): considered to be any identifiable health information that is used, maintained, stored, or transmitted by a HIPAA-covered entity – A healthcare provider, health plan or health insurer, or a healthcare clearinghouse – or a business associate of a HIPAA-covered entity, in relation to the provision of healthcare or payment for healthcare services. PHI is made up of 18 identifiers, including names, social security number, and medical record numbers (source)

Privacy terms

 

  • Anonymization is a process where personally identifiable information (whether direct or indirect) from data sets is removed or manipulated to prevent re-identification. This process must be made irreversible. 
  • Data controller is a person, an authority or a body that determines the purposes for which and the means by which personal data is collected.
  • Data lake is a collection point for the data a business collects. 
  • Data processor is a person, an authority or a body that processes personal data on behalf of the controller. 
  • De-identified data is the result of removing or manipulating direct and indirect identifiers to break any links so that re-identification is impossible. 
  • Differential privacy is a privacy framework that characterizes a data analysis or transformation algorithm rather than a dataset. It specifies a property that the algorithm must satisfy to protect the privacy of its inputs, whereby the outputs of the algorithm are statistically indistinguishable when any one particular record is removed in the input dataset.
  • Direct identifiers are pieces of data that identify an individual without the need for more data, ex. name, SSN, etc.
  • Homomorphic encryption is a method of performing a calculation on encrypted information (ciphertext) without decrypting it (to plaintext) first.
  • Identifier: Unique information that identifies a specific individual in a dataset. Examples of identifiers are names, social security numbers, and bank account numbers. Also, any field that is unique for each row. 
  • Indirect identifiers are pieces of data that can be used to identify an individual indirectly, or with the combination of other pieces of information, ex. date of birth, gender, etc.
  • Insensitive: Information that is not identifying or quasi-identifying and that you do not want to be transformed.
  • k-anonymity is where identifiable attributes of any record in a particular database are indistinguishable from at least one other record.
  • Perturbation: Data can be perturbed by using additive noise, multiplicative noise, data swapping (changing the order of the data to prevent linkage) or generating synthetic data.
  • Pseudonymization is the processing of personal data in a way that the personal data can no longer be attributed to a specific data subject without the use of additional information. This is provided that such additional information is kept separately and is subject to technical and organizational
  • Quasi-identifiers (also known as Indirect identifiers) are pieces of information that on its own are not sufficient to identify a specific individual but when combined with other quasi-identifiers is possible to re-identify an individual. Examples of quasi-identifiers are zip code, age, nationality, and gender.
  • Re-identification, or de-anonymization, is when anonymized data (de-identified data) is matched with publicly available information, or auxiliary data, in order to discover the individual to which the data belong to.
  • Secure multi-party computation (SMC), or Multi-Party Computation (MPC), is an approach to jointly compute a function over inputs held by multiple parties while keeping those inputs private. MPC is used across a network of computers while ensuring that no data leaks during computation. Each computer in the network only sees bits of secret shares — but never anything meaningful.
  • Sensitive: Information that is more general among the population, making it difficult to identify an individual with it. However, when combined with quasi-identifiers, sensitive information can be used for attribute disclosure. Examples of sensitive information are salary and medical data. Let’s say we have a set of quasi-identifiers that form a group of women aged 40-50, a sensitive attribute could be “diagnosed with breast cancer.” Without the quasi-identifiers, the probability of identifying who has breast cancer is low, but once combined with the quasi-identifiers, the probability is high.
  • Siloed data is data stored away in silos with limited access, to protect it against the risk of exposing private information. While these silos protect the data to a certain extent, they also lock the value of the data.
Masking is killing data science

Masking is killing data science

When it comes to data science, the trade-off for protecting data while keeping its value appears near impossible. And with the introduction of privacy legislation like the California Consumer Privacy Act (CCPA), this trade-off makes the job even harder.

Methods such as data masking appear the standard option, with privacy risks landing at almost 0%. However, with information loss reaching a potential of over 50%, the opportunity for data analytics vanishes.

Data Masking is a lost battle

Data Masking is a de-identification technique that focuses on the redaction or transformation of information within a dataset to prevent exposure. The information in the resulting is of low quality. This technique is not enough to move a company forward in innovation.

Companies need to privacy protect their consumer data. However, they also need to preserve the value of the data for analytical uses.

Masking fails to address how data works today and how a business benefits for it. Consumer data is beneficial to all aspects of an organization and creates a better experience for the customer. Failing to utilize and protect the datasets leaves your company behind in innovation and consumer satisfaction.

Privacy-protection that preserves analytical value

Data scientists need to be able to control the trade-off, and the only way to do it is by using “smart” optimization solutions.

A “smart” optimization solution is one that can modify the data in different ways using privacy risk and analytical value as its optimization functions. With a solution like this, a data scientist would get a data set that is optimized for analytics, and that is privacy compliant, the best of both worlds.

Smart Optimization vs Masking

Let’s look at the impact that both privacy-protection solutions have on a machine learning algorithm.

For this example, we want to predict loan default risk using a random forest model. The model is going to be run on three datasets:

  • In the clear: The original dataset without any privacy transformations.
  • Masked dataset: Transformation of the original dataset using standard rule-based masking techniques.
  • Optimized dataset: Transformation of the original dataset using a smart optimization solution.

 

The dataset has 11 variables:

  • Age
  • Sex
  • Job
  • Housing
  • Saving Account Balance
  • Checking Account Balance
  • Credit Account Balance
  • Duration
  • Purpose
  • Zipcode
  • Risk

Let’s compare the results.

Running the model with the original dataset gave us an accuracy of 93%; however, the risk of re-identification is 100%. When we used the masked data, the model accuracy dropped to 28%, since there were 5 risk levels, the accuracy of this model is barely better than random. On the positive side, the risk of re-identification is 0%. Lastly, the accuracy with the optimized dataset was 87%, a drop of only 5 points vs the original data. Additionally, the risk of re-identification was only 3%.

While having a 0% privacy risk is appealing, the loss in accuracy makes masking worthless for analytic purposes.

This example highlights why masking is killing data science, and organizations need to implement smart optimization solutions, like CryptoNumeric’s CN-Protect, that reduce the risk of-reidentification while preserving the analytical value of the data.

Gaining a competitive edge in your industry means utilizing consumer data. And by adequately protecting the data without mass data loss, a high data value can take your company far.

 

 

Join our newsletter


Looking Ahead to LGPD, Brazil’s GDPR

Looking Ahead to LGPD, Brazil’s GDPR

Since the implementation and success of the General Data Protection Regulation (GDPR), privacy has emerged globally as a legislative hot topic. It has influenced governments and consumers to take control of their privacy. Taking inspiration from the EU’s regulation, Brazil has created its own privacy legislation, the Brazilian General Protection Law (LGPD).

LGPD looks to unify Brazil’s 40 current privacy laws for online security. Through this implementation, Brazil seeks to consolidate and control how companies collect, use, disclose, and process personal data. 

LGPD is set to come into effect in August 2020, leaving companies with less than 7 months to prepare and comply with its new legislation. However, companies that are already compliant with GDPR will find most of the preparation is already in place.

GDPR has influenced countries around the world to follow their regulations into privacy protection. LGPD is no different. Its original conception in 2018 bore almost identical provisions before editing and vetoing. Much like with GDPR, complying with LGPD is necessary for any organization to maintain not only customer trust but also their company’s analytics and monetizations. 

Just like GDPR, this legislation defines its applicability in Article 3, referring to any data processing operation or processing personal data within its territory. Meaning, companies that are not located in Brazil, but deal with data processed within Brazil are required to follow LGPD legislation. 

Just like GDPR, this legislation applies to any processing operation carried out by a natural person or a legal entity, of public or private law. LGPD is irrespective of means used for processing, as well the country where either its headquarters or data is located. This is provided that:

  • The processing operation is carried out in Brazil
  • The purpose of the processing activity is to offer or provide goods or services, or the processing of data of individuals located in Brazil
  • The personal data was collected in Brazil

(Source)

Article 7 lists a limited number of situations where the processing of personal data is allowed. The definitions include:

  • Consent of the data subject (meaning consent is given in writing/must be proven, as well the data maintains the ability to be revoked at any time)
  • Compliance with a legal or regulatory obligation by the controller;
  • Anonymized data not considered personal data, only if it cannot be re-identified

How is personal data defined? 

Unlike GDPR, LGPD takes a broad approach with its definition of personal data. By doing so, LGPD thus can apply its legislation not only to data explicitly identifying a person, but also data in which an identity can be inferred. 

For both GDPR and CCPA, anonymized data remains viable for companies to utilize for monetization or analytics. LGPD instead states that anonymized data remains defined as personal when being used for tasks such as behavioral tracking. 

Article 18 of LGPD defines 9 personal data subjects’ rights. These include: 

  • Confirmation of the existence of their data being processed
  • Access to the data
  • Correct any incomplete, inaccurate or out-of-date data
  • Anonymization, blocking or deletion of unnecessary or noncompliance data 
  • Portability of their data i.e., an express request to another service or processor
  • Deletion of personal data 
  • Information about public and private entities with which the controller has shared data
  • Information about the possibility of denying consent and the consequences
  • Revoke consent

Who is the ANPD? 

The National Data Protection Authority (ANPD) is mentioned a multitude of times throughout the legislation, in which it is regarded as responsible for overseeing the enforcement of privacy and data protection laws in Brazil. The ANPD is therefore responsible for monitoring, issuing guidelines and enforcing data protection laws throughout Brazil. 

Central powers of the ANPD include:

  • Issuing guidelines for the implementation of LGPD, data protection, and privacy
  • Examine complaints 
  • Investigate and apply sanctions
  • Prepare studies and educate society
  • Encourage the adoption of standards for services and products that facilitate data subjects’ control over their data
  • Promote cooperative actions with data protection authorities from other countries 

This role is similar to France’s National Commission for Information Technology and Liberties (CNIL), put in place during the introduction of GDPR. Similar to the ANPD, some of the CNIL’s role includes ensuring compliance, informing data controllers of their responsibilities, rights and obligations, as well as controlling and sanctioning. 

How is LGPD different from GDPR? 

One of the main successes of GDPR was its hefty fines for companies, such as Google’s penalty of 50 milion euros. By creating significant fines, companies are placed at risk for losing tens of millions of dollars. However, LGPD has put forward much less regulation for fines and warnings of violations: 

Sanction

LGPD

GDPR

Warning

No specific time frame of response, except for “an appropriate time period”

A reply must be issued within 72 hours of receiving the notification

Fines

Up to 2% of revenue in Brazil or R$50, 000, 000 (whichever is higher). This is equivalent to $12.9million USD

Up to 4% of annual revenue for a company or €20 million (whichever is higher). This is equivalent to$22million USD

As shown in the chart above, LGPD fines and warnings are significantly smaller than those of GDPR. 

Both GDPR and LGPD require Data protection officers (DPO). These officers have the responsibility of confirming that organizations are complying with protecting personal data. However, GDPR requires that this position is appointed by both a data controller and processor and that the company meeting specific requirements needs a DPO. LGPD instead requires that every company have an assigned DPO, who is given their role by a data processor.

GDPR has made clear the importance of compliance, and the introduction of LGPD is no different. Protecting consumer privacy is important not only in complying with LGPD but with maintaining data analytics and monetization.

LGPD is a significant step in privacy regulations. Success in Brazil’s data privacy regulations could influence more countries to move towards these same regulations, just as GDPR has influenced many countries. As these privacy regulations begin to emerge across the globe, acting now is more important than ever in order to comply and succeed in the new privacy era. 

To read more privacy blogs, click here.

 

Join our newsletter