Select Page
Location data and your privacy

Location data and your privacy

As technology grows to surround the entirety of our lives, it comes as no surprise that each and every move is tracked and stored by the very apps we trust with our information. With the current COVID-19 pandemic, the consequences of inviting these big techs into our every movement are being revealed. 

At this point, most of the technology-users understand the information they do give to companies, such as their birthdays, access to pictures, or other sensitive information. However, some may be unknowing of the amount of location data that companies collect and how that affects their data privacy. 

Location data volume expected to grow

We have created over 90% of the world’s data since 2017. As wearable technology continues to grow in trend, the amount of data a person creates each day is on a steady incline. 

One study reported that by 2025, the installation of worldwide IoT-enabled devices is expected to hit 75 billion. This astronomical number highlights how intertwined technology is into our lives, but also how welcoming we are to that technology; technology that people may be unaware of the ways their data is collected. 

Marketers, companies and advertisers will increasingly look to using location-based information as its volume grows. A recent study found that more than 84% of marketers use location data for their 

The last few years have seen a boost in big tech companies giving their users more control over how their data is used. One example is in 2019 when Apple introduced pop-ups to remind users when apps are using their location data.

Location data is saved and stored for the benefit of companies to easily direct personalized ads and products to your viewing. Understanding what your devices collect from you, and how to eliminate data sharing on your devices is crucial as we move forward in the technological age. 

Click here to read our past article on location data in the form of wearable devices. 

COVID-19 threatens location privacy

Risking the privacy of thousands of people or saving thousands of lives seems to be the question throughout this pandemic; a question that is running out of time for debate. Companies across the big 100 have stepped up to volunteer its anonymized data, including SAS, Google and Apple. 

One of the largest concerns is not how this data is being used in this pandemic, but how it could be abused in the future. 

One Forbes article brought up a comparison of the regret many are faced with after sharing DNA with sites like 23andMe, leading to health insurance issues or run-ins with criminal activity. 

As companies like Google, Apple and Facebook step-up to the COVID-19 technology race, many are expressing their concerns as these companies have not been deemed reliable for user data anonymization. 

In addition to the data-collecting concern, governments and big tech companies are looking into contact-tracking applications. Civilian location data being used for surveillance purposes, while alluded for the greater good of health and safety, raises multiple red flags into how our phones can be used to survey our every movement. To read more about this involvement in contact tracing apps, read our latest article

Each company has released that it anonymizes its collected data. However, in this pandemic age, anonymized information can still be exploited, especially at the hands of government intervention. 

With all this said, big tech holds power over our information and are playing a vital role in the COVID-19 response. Paying close attention to how user data is managed post-pandemic will be valuable in exposing how these companies handle user information.

 

4 techniques for data science

4 techniques for data science

With growing tension between privacy and analytics, the job of data scientists and data architects has become more complicated. The responsibility of data professionals is not just to maximize the value of the data, but to find ways in which data can be privacy protected while preserving its analytical value.

The reality today is that regulations like GDPR and CCPA have disrupted the way in which data flows through organizations. Now data is being siloed and protected using techniques that are not suited for the data-driven enterprise. Data professionals are left with long processes to access the information they need and, in many cases, the data they receive has no analytical value after it has been protected. 

This emphasizes the importance of using adequate privacy protection tactics to ensure that personally identifiable information (PII) is accessible in a privacy-protected manner and that it can be used for analytics.

To satisfy GDPR and CCPA, organizations can choose between three options, pseudonymization, anonymization, and consent: 

Pseudonymization is replacing direct identifiers, like names or emails, with pseudonyms to protect the privacy of the individual. However, this process is still in the scope of the privacy regulations, and the risk for re-identification remains very high.

Anonymization, on the other hand, looks at direct identifiers and quasi-identifiers and transforms the data in a way that’s now out-of-scope for privacy regulations and can be used for analytics. 

Consent requires organizations to ask customers for their consent on the usage of data, this opens up the opportunity for opt-outs. If the usage of the data changes, as it often does in an analytics environment, then consent may very well be required each time.

There are four main techniques that can help data professionals with privacy protection. All of them have different impacts on both privacy protection and data quality. These are: 

Masking: A de-identification technique that focuses on the redaction or transformation of information within a dataset to prevent exposure. 

K-anonymity: This privacy model ensures that each individual is indistinguishable from at least k-1 other individuals based on their attributes in a dataset.

Differential Privacy: Is a technique applied to an algorithm that mathematically guarantees that the output of the algorithm doesn’t change whether an individual is in the dataset or not. It is achieved through the addition of noise to the algorithm. 

Secure Multi-Party Computation: This is a cryptographic technique where a group of parties can compute a function over their inputs while keeping their inputs private.

Keep your eyes peeled in the next few weeks for our whitepaper, which will explore these four techniques in further detail.

Key terms to know to navigate data privacy

Key terms to know to navigate data privacy

As the data privacy discourse continues to grow, it’s crucial that the terms used to explain data science, data privacy and data protection are accessible to everyone. That’s why we at CryptoNumerics have compiled a continuously growing Privacy Glossary, to help people learn and better understand what’s happening to their data. 

Below are 25 terms surrounding privacy legislations, personal data, and other privacy or data science terminology to help you better understand what our company does, what other privacy companies do, and what is being done for your data.

Privacy regulations

    • General Data Protection Regulation (GDPR) is a privacy regulation implemented in May 2018 that has inspired more regulations worldwide. The law determined data controllers must establish a specific legal basis for each and every purpose where personal data is used. If a business intends to use customer data for an additional purpose, then it must first obtain explicit consent from the individual. As a result, all data in data lakes can only be made available for use after processes have been implemented to notify and request permission from every subject for every use case.
    • California Consumer Privacy Act (CCPA) is a sweeping piece of legislation that is aimed at protecting the personal information of California residents. It will give consumers the right to learn about the personal information that businesses collect, sell, or disclose about them, and prevent the sale or disclosure of their personal information. It includes the Right to Know, Right of Access, Right to Portability, Right to Deletion, Right to be Informed, Right to Opt-Out, and Non-Discrimination Based on Exercise of Rights. This means that if consumers do not like the way businesses are using their data, they request for it to be deleted -a risk for business insights 
    • Health Insurance Portability and Accountability Act (HIPAA) is a health protection regulation passed in 1998 by President Clinton. This act gives patients the right to privacy and covers 18 personal identifiers that are required to be de-identified. This Act is applicable not only in hospitals but in places of work, schooling, etc.

Legislative Definitions of Personal Information

  • Personal Data (GDPR): Any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person’ (source)
  • Personal Information (PI) (CCPA): “information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.” (source)
  • Personal Health Information (PHI) (HIPAA): considered to be any identifiable health information that is used, maintained, stored, or transmitted by a HIPAA-covered entity – A healthcare provider, health plan or health insurer, or a healthcare clearinghouse – or a business associate of a HIPAA-covered entity, in relation to the provision of healthcare or payment for healthcare services. PHI is made up of 18 identifiers, including names, social security number, and medical record numbers (source)

Privacy terms

 

  • Anonymization is a process where personally identifiable information (whether direct or indirect) from data sets is removed or manipulated to prevent re-identification. This process must be made irreversible. 
  • Data controller is a person, an authority or a body that determines the purposes for which and the means by which personal data is collected.
  • Data lake is a collection point for the data a business collects. 
  • Data processor is a person, an authority or a body that processes personal data on behalf of the controller. 
  • De-identified data is the result of removing or manipulating direct and indirect identifiers to break any links so that re-identification is impossible. 
  • Differential privacy is a privacy framework that characterizes a data analysis or transformation algorithm rather than a dataset. It specifies a property that the algorithm must satisfy to protect the privacy of its inputs, whereby the outputs of the algorithm are statistically indistinguishable when any one particular record is removed in the input dataset.
  • Direct identifiers are pieces of data that identify an individual without the need for more data, ex. name, SSN, etc.
  • Homomorphic encryption is a method of performing a calculation on encrypted information (ciphertext) without decrypting it (to plaintext) first.
  • Identifier: Unique information that identifies a specific individual in a dataset. Examples of identifiers are names, social security numbers, and bank account numbers. Also, any field that is unique for each row. 
  • Indirect identifiers are pieces of data that can be used to identify an individual indirectly, or with the combination of other pieces of information, ex. date of birth, gender, etc.
  • Insensitive: Information that is not identifying or quasi-identifying and that you do not want to be transformed.
  • k-anonymity is where identifiable attributes of any record in a particular database are indistinguishable from at least one other record.
  • Perturbation: Data can be perturbed by using additive noise, multiplicative noise, data swapping (changing the order of the data to prevent linkage) or generating synthetic data.
  • Pseudonymization is the processing of personal data in a way that the personal data can no longer be attributed to a specific data subject without the use of additional information. This is provided that such additional information is kept separately and is subject to technical and organizational
  • Quasi-identifiers (also known as Indirect identifiers) are pieces of information that on its own are not sufficient to identify a specific individual but when combined with other quasi-identifiers is possible to re-identify an individual. Examples of quasi-identifiers are zip code, age, nationality, and gender.
  • Re-identification, or de-anonymization, is when anonymized data (de-identified data) is matched with publicly available information, or auxiliary data, in order to discover the individual to which the data belong to.
  • Secure multi-party computation (SMC), or Multi-Party Computation (MPC), is an approach to jointly compute a function over inputs held by multiple parties while keeping those inputs private. MPC is used across a network of computers while ensuring that no data leaks during computation. Each computer in the network only sees bits of secret shares — but never anything meaningful.
  • Sensitive: Information that is more general among the population, making it difficult to identify an individual with it. However, when combined with quasi-identifiers, sensitive information can be used for attribute disclosure. Examples of sensitive information are salary and medical data. Let’s say we have a set of quasi-identifiers that form a group of women aged 40-50, a sensitive attribute could be “diagnosed with breast cancer.” Without the quasi-identifiers, the probability of identifying who has breast cancer is low, but once combined with the quasi-identifiers, the probability is high.
  • Siloed data is data stored away in silos with limited access, to protect it against the risk of exposing private information. While these silos protect the data to a certain extent, they also lock the value of the data.
CCPA 1 month in review

CCPA 1 month in review

The California Consumer Privacy Act (CCPA) is privacy legislation that regulates companies that collect and process data of California residents, even if the company is based elsewhere. The law requires that consumers are given the option to opt-out of data collection/selling, and/or have their data completely removed from those datasets. 

As well, any data that is collected still has to be protected. Not only does this protect consumers, but it makes it easier for companies to comply with data deletion requests. 

While CCPA came into effect on January 1st, it has yet to create the waves in privacy that many were hoping for. 

What is happening to my data privacy? 

As of right now, not too much. Many large companies, such as Facebook, have made changes to their privacy policies in order to be compliant, however many others are slow-moving to do so. Rules of compliance continue to be a work in progress, generating both mass confusion and the slow start of some companies fulfilling the changing laws. 

Mary Stone Ross, associate director of the electronic privacy information center, says that enforcement of CCPA will likely not start for months, as well as will be an underfunded program. Not only this, it appears the likelihood of prosecuting CCPA cases will be limited to just 3 cases per year. 

Because of this, CCPA’s enforcement date for companies will start in July, despite its implementation already passing. 

Part of the legislation includes the opportunity to request my data. Is this something companies have started abiding by? 

While many companies are complying with CCPA and returning user data, others are making the interaction more complicated than necessary. Some companies are redirecting their customers to multiple outside organizations while others are offering to send data and then never following through. 

One writer at the Guardian requested her data from Instagram, and while she received 3.92GB of memory, there was plenty of information that the photo-sharing giant left out from her report. 

Despite the 8000 photos, direct messages, and search history, there was not much that couldn’t be found in the app already. The company failed to send the metadata of which they have stated in their data policy to storing. This metadata could include information regarding the location of where photos were taken. 

Instagram is not the only application to send incomplete information when requested. Spotify, a leading music streaming platform, complies with CCPA in sharing data. However, after denying one user’s original request, the platform responded with a light 4.7-megabyte file, despite this person having a 9-year-old account. 

Another social media, Twitter, sent users their files in Javascript, making it impossible for users without coding knowledge to understand the contents of their Twitter history.

Such companies are getting away by complying at a bare minimum -and they are allowed to do this. Companies like Instagram can send snippets of data when requested, and users cannot prove that they did not receive all of it. 

Because CCPA has not seen a total resurrection, companies are pushing around users into thinking they are abiding by the law, without adequately protecting their data.

Is my data still being sold? 

CCPA requires that companies provide users with the opportunity to opt-out of data sharing/selling. However, in many cases, information is often buried, small print, and unclear for a user to find. 

Data aggregators have partnered with companies participating in data sharing and are the go-to when users want to opt-out of data sharing. 

Acxiom is an example of a company taking the edge off consumers who want their data back. By placing information into the Acxiom site, the authorized agent scours sights requesting the deletion or viewing of your data. 

The issue with sites such as Acxiom is that the majority of internet users are unfamiliar with these types of applications. Thus, finding ways to view and delete your data becomes exhausting. 

The average Internet user participates in over 6 hours on the Internet per day. With the human attention span decreasing, the number of websites one person may visit per day could be well over 50. User’s visiting a webpage for only one article, or for only a few minutes, would most likely not spend the extra time searching for a Do Not Sell link. 

Because of this, companies remain compelled to hide the opportunity for users to take control of their data. And while CCPA should be effective for the average user’s data, it is still unclear the impact it will have.

Join our newsletter


CCPA is here. Are you compliant?

CCPA is here. Are you compliant?

As of January 1, 2020, the California Consumer Privacy Act (CCPA) came into effect and has already altered the ways companies can make use of user data. 

Before the CCPA implementation, Big Data companies had the opportunity to harvest user data and use it for data science, analytics, AI, and ML projects. Through this process, consumer data was monetized without protection for privacy. With the official introduction of the CCPA, companies now have no choice but to oblige or pay the price. Therefore begging the question; Is your company compliant?

CCPA Is Proving That Privacy is not a Commodity- It’s a Right

This legislation enforces that consumers are safe from companies selling their data for secondary purposes. Without explicit permission to use data, companies are unable to utilize said data.

User data is highly valuable for companies’ analytics or monetization initiatives. Thus, risking user opt-outs can be detrimental to a company’s progressing success. By de-identifying consumer data, companies can follow CCPA guidelines while maintaining high data quality. 

The CCPA does not come without a highly standardized ruleset for companies to satisfy de-identification. The law comes complete with specific definitions and detailed explanations of how to achieve its ideals. Despite these guidelines in place, and the legislation only just being put into effect, studies have found that only 8% of US businesses are CCPA compliant.  

For companies that are not CCPA compliant as of yet, the time to act is now. By thoroughly understanding the regulations put out by the CCPA, companies can protect their users while still benefiting from their data. 

To do so, companies must understand the significance of maintaining analytical value and the importance of adequately de-identified data. By not complying with CCPA, an organization is vulnerable to fines up to $7500 per incident, per violation, as well as individual consumer damages up to $750 per occurrence.

For perspective, after coming into effect in 2019, GDPR released that its fines impacted companies at an average of 4% of their annual revenue.

To ensure a CCPA fine is not coming your way, assess your current data privacy protection efforts to ensure that consumers:

  • are asked for direct consent to use their data
  • can opt-out or remove their data for analytical purposes
  • data is not re-identifiable

In essence, CCPA is not impeding a company’s ability to use, analyze, or monetize data. CCPA is enforcing that data is de-identified or aggregated, and done so to the standards that its legislation requires.

Our research found that 60% of datasets believed, by companies, to be de-identified, had a high re-identification risk. There are three methods to reduce the possibility of re-identification: 

  • Use state-of-the-art de-identification methods
  • Assess for the likelihood of re-identification
  • Implement controls, so data required for secondary purposes is CCPA compliant

Read more about these effective privacy automation methods in our blog, The business Incentives to Automate Privacy Compliance under CCPA.

Manual Methods of De-Identification Are Tools of The Past

A standard of compliance within CCPA legislation involves identifying which methods of de-identification leaves consumer data susceptible to re-identification. The manual way, which is extremely common, can leave room for re-identification. By doing so, companies are making themselves vulnerable to CCPA.

Protecting data to a company’s best abilities is achievable through techniques such as k-anonymity and differential privacy. However, applying manual methods is impractical for meeting the 30-day gracing period CCPA provides or in achieving high-quality data protection.

Understanding CCPA ensures that data is adequately de-identification and has removed risk, all while meeting all legal specifications.

Achieving CCPA regulations means ditching first-generation approaches to de-identification, and adopting privacy automation defers the possibility of re-identification. Using privacy automation as a method to protect and utilize consumer’s data is necessary for successfully maneuvering the new CCPA era. 

The solution of privacy automation ensures not only that user data is correctly de-identified, but that it maintains a high data quality. 

CryptoNumerics as the Privacy Automation Solution

Despite CCPA’s strict guidelines, the benefits of using analytics for data science and monetization are incredibly high. Therefore, reducing efforts to utilize data is a disservice to a company’s success.

Complying with CCPA legislation means determining which methods of de-identification leave consumer data susceptible to re-identification. Manual approach methods of de-identification including masking, or tokenization, leave room for improper anonymization. 

Here, Privacy Automation becomes necessary for an organization’s analytical tactics. 

Privacy automation abides CCPA while benefiting tools of data science and analytics. If a user’s data is de-identified to CCPA’s standards, conducting data analysis remains possible. 

Privacy automation revolves around assessment, quantification, and assurance of data. Simultaneously, a privacy automation tool measures the risk of re-identification, applying data privacy protection techniques, and providing audit reports. 

A study by PossibleNow indicated that 45% of companies are in the process of preparing, but had not expected to be compliant by the CCPA’s implementation date. Putting together a privacy automation tool to better process data and prepare for the new legislation is critical in a companies success with the CCPA. Privacy automation products such as CN-Protect allow companies to succeed in data protection while benefiting from the data’s analytics. (Learn more about CN-Protect)

Join our newsletter


Big data privacy regulations can only be met with privacy automation

Big data privacy regulations can only be met with privacy automation

GDPR demands that businesses obtain explicit consent from data subjects before collecting or using data. CCPA affords consumers the right to request that their data is deleted if they don’t like how a business is using it. PIPEDA requires consumers to provide meaningful consent before their information is collected, used, and disclosed. New privacy laws are coming to India (PDPB), Brazil (LGPD), and over 100 other countries. In the US alone, over 25 state privacy laws have been proposed, with a national one in the works. Big data privacy laws are expansive, restrictive, and they are emerging worldwide faster than you can say, “what about analytics?”.

Such has made it challenging for businesses to (1) keep up, (2) get compliant, and (3) continue performing analytics. Not only are these regulations inhibitive, but a failure to meet the standards will result in astronomical fines — like British Airway’s 204.6 M euros. As such, much distress and confusion has ensued in the big data community.

 

Businesses are struggling to adapt to the rapid increase in privacy regulations

Stakeholders cannot agree whose responsibility it is to ensure compliance, they are struggling with consent management, and they are under the interpretation that removing direct identifiers renders data anonymous.

Major misconceptions can cost businesses hundreds of millions. So let’s break them down.

  1. “Consent management is the only way to keep performing analytics.”

While consent is essential at the point of collection, the odds are that, down the road, businesses will want to repurpose data. Obtaining permission in these cases, due to the sheer volume of data repositories, is an unruly and unmanageable process. A better approach is to anonymize the data. Once this has occurred, data is no longer personal, and it goes from consumer information to business IP.

2. “I removed the direct identifiers, so my data is anonymized”

If this were the case, anonymization would be an easy process. Sadly, it is not so. In fact, it has been widely acknowledged that simply redacting directly identifying information, like names, is nowhere near sufficient. In almost all cases, this leaves most of the dataset re-identifiable.

3. “Synthetic data is the best way to manage emerging regulations.”

False! Synthetic data is a great alternative for testing, but when it comes to achieving insights, it is not the way to go. Since this process attempts to replicate trends, important outlier information can be missed. As a result, the data is unlikely to mirror real-world consumer information, compromising the decision-making process.

What’s evident from our conversations with data-driven organizations is that businesses need a better solution. Consent management is slowing them down, legacy approaches to anonymization are ineffective, and current workarounds skew insights or wipe data value.

 

Privacy automation: A better approach to big data privacy laws

The only manageable and effective solution to big data privacy regulations is privacy automation. This process measures the risk of re-identification, applies privacy-protection techniques, and provides audit reports throughout the anonymization process. It is embedded in an organization’s data pipeline, spreading the solution enterprise-wide and harmonizing the needs of stakeholders by optimizing for anonymization and preservation of data value.

This solution will simplify the compliance process by enabling privacy rules definition, risk assessments, application of privacy actions, and compliance reporting to happen within a single application. In turn, privacy automation allows companies to unlock data in a manner that protects and adds value to consumers.

Privacy automation is the best method for businesses to handle emerging laws and regain the mission-critical insights they have come to rely on. Through this approach, privacy unlocks insights.

Join our newsletter