Select Page
Key terms to know to navigate data privacy

Key terms to know to navigate data privacy

As the data privacy discourse continues to grow, it’s crucial that the terms used to explain data science, data privacy and data protection are accessible to everyone. That’s why we at CryptoNumerics have compiled a continuously growing Privacy Glossary, to help people learn and better understand what’s happening to their data. 

Below are 25 terms surrounding privacy legislations, personal data, and other privacy or data science terminology to help you better understand what our company does, what other privacy companies do, and what is being done for your data.

Privacy regulations

    • General Data Protection Regulation (GDPR) is a privacy regulation implemented in May 2018 that has inspired more regulations worldwide. The law determined data controllers must establish a specific legal basis for each and every purpose where personal data is used. If a business intends to use customer data for an additional purpose, then it must first obtain explicit consent from the individual. As a result, all data in data lakes can only be made available for use after processes have been implemented to notify and request permission from every subject for every use case.
    • California Consumer Privacy Act (CCPA) is a sweeping piece of legislation that is aimed at protecting the personal information of California residents. It will give consumers the right to learn about the personal information that businesses collect, sell, or disclose about them, and prevent the sale or disclosure of their personal information. It includes the Right to Know, Right of Access, Right to Portability, Right to Deletion, Right to be Informed, Right to Opt-Out, and Non-Discrimination Based on Exercise of Rights. This means that if consumers do not like the way businesses are using their data, they request for it to be deleted -a risk for business insights 
    • Health Insurance Portability and Accountability Act (HIPAA) is a health protection regulation passed in 1998 by President Clinton. This act gives patients the right to privacy and covers 18 personal identifiers that are required to be de-identified. This Act is applicable not only in hospitals but in places of work, schooling, etc.

Legislative Definitions of Personal Information

  • Personal Data (GDPR): Any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person’ (source)
  • Personal Information (PI) (CCPA): “information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.” (source)
  • Personal Health Information (PHI) (HIPAA): considered to be any identifiable health information that is used, maintained, stored, or transmitted by a HIPAA-covered entity – A healthcare provider, health plan or health insurer, or a healthcare clearinghouse – or a business associate of a HIPAA-covered entity, in relation to the provision of healthcare or payment for healthcare services. PHI is made up of 18 identifiers, including names, social security number, and medical record numbers (source)

Privacy terms


  • Anonymization is a process where personally identifiable information (whether direct or indirect) from data sets is removed or manipulated to prevent re-identification. This process must be made irreversible. 
  • Data controller is a person, an authority or a body that determines the purposes for which and the means by which personal data is collected.
  • Data lake is a collection point for the data a business collects. 
  • Data processor is a person, an authority or a body that processes personal data on behalf of the controller. 
  • De-identified data is the result of removing or manipulating direct and indirect identifiers to break any links so that re-identification is impossible. 
  • Differential privacy is a privacy framework that characterizes a data analysis or transformation algorithm rather than a dataset. It specifies a property that the algorithm must satisfy to protect the privacy of its inputs, whereby the outputs of the algorithm are statistically indistinguishable when any one particular record is removed in the input dataset.
  • Direct identifiers are pieces of data that identify an individual without the need for more data, ex. name, SSN, etc.
  • Homomorphic encryption is a method of performing a calculation on encrypted information (ciphertext) without decrypting it (to plaintext) first.
  • Identifier: Unique information that identifies a specific individual in a dataset. Examples of identifiers are names, social security numbers, and bank account numbers. Also, any field that is unique for each row. 
  • Indirect identifiers are pieces of data that can be used to identify an individual indirectly, or with the combination of other pieces of information, ex. date of birth, gender, etc.
  • Insensitive: Information that is not identifying or quasi-identifying and that you do not want to be transformed.
  • k-anonymity is where identifiable attributes of any record in a particular database are indistinguishable from at least one other record.
  • Perturbation: Data can be perturbed by using additive noise, multiplicative noise, data swapping (changing the order of the data to prevent linkage) or generating synthetic data.
  • Pseudonymization is the processing of personal data in a way that the personal data can no longer be attributed to a specific data subject without the use of additional information. This is provided that such additional information is kept separately and is subject to technical and organizational
  • Quasi-identifiers (also known as Indirect identifiers) are pieces of information that on its own are not sufficient to identify a specific individual but when combined with other quasi-identifiers is possible to re-identify an individual. Examples of quasi-identifiers are zip code, age, nationality, and gender.
  • Re-identification, or de-anonymization, is when anonymized data (de-identified data) is matched with publicly available information, or auxiliary data, in order to discover the individual to which the data belong to.
  • Secure multi-party computation (SMC), or Multi-Party Computation (MPC), is an approach to jointly compute a function over inputs held by multiple parties while keeping those inputs private. MPC is used across a network of computers while ensuring that no data leaks during computation. Each computer in the network only sees bits of secret shares — but never anything meaningful.
  • Sensitive: Information that is more general among the population, making it difficult to identify an individual with it. However, when combined with quasi-identifiers, sensitive information can be used for attribute disclosure. Examples of sensitive information are salary and medical data. Let’s say we have a set of quasi-identifiers that form a group of women aged 40-50, a sensitive attribute could be “diagnosed with breast cancer.” Without the quasi-identifiers, the probability of identifying who has breast cancer is low, but once combined with the quasi-identifiers, the probability is high.
  • Siloed data is data stored away in silos with limited access, to protect it against the risk of exposing private information. While these silos protect the data to a certain extent, they also lock the value of the data.
How can working from home affect your data privacy?

How can working from home affect your data privacy?

On March 11, the World Health Organization declared the Coronavirus (COVID-19) a global pandemic, sending the world into a mass frenzy. Since that declaration, countries around the world have shut borders, closed schools, requested citizens to stay indoors, and sent workers home. 

While the world may appear to be at a standstill, some jobs still need to get done. Like us at CryptoNumerics, companies have sent their workers home with the tools they need to complete their regularly scheduled tasks from the comfort of their own homes. 

However, with a new influx of people working from home, insecure networks, websites or AI tools can lead company information vulnerable. In this article, we’ll go over where your privacy may be at risk during this work-from-home season.

Zoom’s influx of new users raises privacy concerns.

Zoom is a video-conferencing company used to host meetings, online-charts and online collaboration. Since people across the world are required to work or participate in online schooling, Zoom has seen a substantial increase in users. In February, Zoom shares raised 40%, and in 3 months, it has doubled its monthly active users from the entire year of 2019 (Source). 

While this influx and global exposure are significant for any company, this unprecedented level of usage can expose holes in their privacy protection efforts, a concern that many are starting to raise

Zoom’s growing demand makes them a big target for third-parties, such as hackers, looking to gain access to sensitive or personal data. Zoom is being used by companies large and small, as well as students across university campus. This means there is a grand scale of important, sensitive data could very well be vulnerable. 

Some university professors have decided against Zoom telecommuting, saying the Zoom privacy policy, which states that they may collect information about recorded meetings that take place in video conferences, raises too many concerns of personal privacy. 

On a personal privacy level, Zoom gives the administrator of the conference call the ability to see when a caller has moved to another webpage for over 30 seconds. Many are calling this option a violation of employee privacy. 

Internet-rights advocates have begun urging Zoom to begin publishing transparent reports detailing how they manage data privacy and data security.  

Is your Alexa listening to your work conversations?

Both Google Home and Amazon’s Alexa have previously made headlines for listening to homes without being called upon and saving conversation logs.  

Last April, Bloomberg released a report highlighting Amazon workings listening to and transcribing conversations heard through Alexa’s in people’s homes. Bloomberg reported that most voice assistant technologies rely on human help to help improve the product. They reported that not only were the Amazon employees listening to Alexa’s without the Alexa’s being called on by users but also sharing the things they heard with their co-workers. 

Amazon claims the recordings sent to the “Alexa reviewers” are only provided with an account number, not an address or full name to identify a user with. However, the entire notion of hearing full, personal conversations is uncomfortable.

As the world is sent to work from home, and over 100 million Alexa devices are in American homes, there should be some concern over to what degree these speaker systems are listening in to your work conversations.   

Our advice during this work-from-home-long-haul? Review your online application privacy settings, and be cautious of what devices may be listening when you have important meetings or calls. 

Banking and fraud detection; what is the solution?

Banking and fraud detection; what is the solution?

As the year comes to a close, we must reflect on the most historic events in the world of privacy and data science, so that we can learn from the challenges, and improve moving forward.

In the past year, General Data Protection Regulation (GDPR) has had the most significant impact on data-driven businesses. The privacy law has transformed data analytics capacities and inspired a series of sweeping legislation worldwide: CCPA in the United States, LGPD in Brazil, and PDPB in India. Not only has this regulation moved the needle on privacy management and prioritization, but it has knocked major companies to the ground with harsh fines. 

Since its implementation in 2018, €405,871,210 in fines have been actioned against violators, signalling that the DPA supervisory authority has no mercy in its fervent search for the unethical and illegal actions of businesses. This is only the beginning, as the deeper we get into the data privacy law, the more strict regulatory authorities will become. With the next wave of laws hitting the world on January 1, 2020, businesses can expect to feel pressure from all locations, not just the European Union.


The two most breached GDPR requirements are Article 5 and Article 32.

These articles place importance on maintaining data for only as long as is necessary and seek to ensure that businesses implement advanced measures to secure data. They also signal the business value of anonymization and pseudonymization. After all, once data has been anonymized (de-identified), it is no longer considered personal, and GDPR no longer applies.

Article 5 affirms that data shall be “kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed.”

Article 32 references the importance of “the pseudonymization and encryption of personal data.”

The frequency of a failure to comply with these articles signals the need for risk-aware anonymization to ensure compliance. Businesses urgently need to implement a data anonymization solution that optimizes privacy risk reduction and data value preservation. This will allow businesses to measure the risk of their datasets, apply advanced anonymization techniques, and minimize the analytical value lost throughout the process.

If this is implemented, data collection on EU citizens will remain possible in the GDPR era, and businesses can continue to obtain business insights without risking their reputation and revenue. However, these actions can now be done in a way that respects privacy.

Sadly, not everyone has gotten the message, as nearly 130 fines have been actioned so far.

The top five regulatory fines

GDPR carries a weighty fine:  4% of a business’s annual global turnover, or €20M, whichever is greater. A fine of this size could significantly derail a business, and if paired alongside brand and reputational damage, it is evident that GDPR penalties should encourage businesses to rethink the way they handle data

1. €204.6M: British Airways

Article 32: Insufficient technical and organizational measures to ensure information security

User traffic was directed to a fraudulent site because of improper security measures, compromising 500,000 customers’ personal data. 

 2. €110.3M: Marriott International

Article 32: Insufficient technical and organizational measures to ensure information security

The guest records of 339 million guests were exposed in a data breach due to insufficient due diligence and a lack of adequate security measures.

3. €50M: Google

Article 13, 14, 6, 5: Insufficient legal basis for data processing

Google was found to have breached articles 13, 14, 6, and 5 because it created user accounts during the configuration stage of Android phones without obtaining meaningful consent. They then processed this information without a legal basis while lacking transparency and providing insufficient information.

4. €18M: Austrian Post

Article 5, 6: Insufficient legal basis for data processing

Austrian Post created more than three million profiles on Austrians and resold their personal information to third-parties, like political parties. The data included home addresses, personal preferences, habits, and party-affinity.

5. €14.5M: Deutsche Wohnen SE

Article 5, 25: Non-compliance with general data processing principles

Deutsche Wohnen stored tenant data in an archive system that was not equipped to delete information that was no longer necessary. This made it possible to have unauthorized access to years-old sensitive information, like tax records and health insurance, for purposes beyond those described at the original point of collection.

Privacy laws like GDPR seek to restrict data controllers from gaining access to personally identifiable information without consent and prevent data from being handled in manners that a subject is unaware of. If these fines teach us anything, it is that investing in technical and organizational measures is a must today. Many of these fines could have been avoided had businesses implemented Privacy by Design. Privacy must be considered throughout the business cycle, from conception to consumer use. 

Businesses cannot risk violations for the sake of it. With a risk-aware privacy software, they can continue to analyze data while protecting privacy -with the guarantee of a privacy risk score.

Resolution idea for next year: Avoid ending up on this list in 2020 by adopting risk-aware anonymization.

The data access bottleneck

The data access bottleneck

We create an influx of information each day, minute, and second. In the United States alone, 4, 416, 720 gigabytes of data were used every minute in 2019. This number is reported to have risen 41% since its 2018 report. 

As we continue entering the fast-paced era of technology, the world has been bombarded with hoards of user information without the resources ready to manage it. The role of Data Scientist, a career that didn’t exist ten years ago, has topped Glassdoor’s list of the best roles in America for the last five years. 

The responsibility of a data scientist includes collecting and cleaning data, performing analysis, applying data science techniques and measuring analytic results. This vital process helps businesses by providing customer insights to help manage innovation. However, the process of receiving and analyzing data loses precedence as cleaning and organizing the data takes time.

Data scientists search out the data needed, through other departments or data lakes, creating hours of waiting to receive the information they need. When finally provided with the information necessary, it may contain severe data quality issues. This takes a considerable amount of time away from being able to provide an actual analysis of the data. 

There is a typical time division for this very scenario, known as the 80/20 rule. 80% of data scientists’ work time is spent finding data and cleaning it, while 20% of their time is spent providing analysis on the data. 

This bottleneck of information leads to an increase in potential error and dries up analytical resources. 

One survey conducted by TMMData and the digital analytics association created insight into the difficulty a data scientist faces before getting the opportunity to implement analytic techniques. 56.9% of the 800 surveyed said it takes a few days to a few weeks before they are granted access to all the data they need.

The study also said that only ⅓ are able to immediately access all the data they need or receive the required data in less than one day. 

On top of this, 43 respondents to the survey mentioned that gaining data access to be one of their top two analytics challenges. 

On top of the difficulty of gaining access to the data, this influx of information stored in data lakes is of poor quality. 48% of data scientists questioned the accuracy of the data they received. This incomplete or bad data can lead a data scientist in the wrong direction of their analytic process.

In 2017, IBM released that the two previous years had created 90% of the world’s data. As technology grows, the ability to consume and organize data must expand as well. To reverse the 80/20 time statistic for data science, companies’ abilities to harness and manage data as its collected must improve. 

Flipping 80/20 for data science

Based on the statistics presented, the most significant issue for data scientists involved access wait time and cleaning the data once received. 

It’s understandable why data is so disorganized right now. No one could predict the pace the internet and technology took just ten years ago. Knowing how and when to prepare and store data is still a relatively new issue. 

To improve this issue, the efficiency of data prep must be increased, and the number of people involved with the data should expand. By expanding the data over the organization, and limiting the prep time using less manual methods, companies will see a faster turnover of data. 

Now is the time to play catch-up, and organize the incoming data so that analytics can be prepped and ready to move your company forward as fast as possible. 

What does COVID-19 mean for patient privacy?

What does COVID-19 mean for patient privacy?

The rapid spread of the Coronavirus (COVID-19) has sent the world into mass shock, halting the movement in the economy, companies, schools and regular life. 

In situations of mass panic such as this, maintaining privacy and legislation compliance is the last thing on the publics’ minds. However, for companies and hospitals, this should not be the case. In this weekly news, we will go through how proper data sharing is beneficial, how governments are reacting to privacy concerns, and how employers should be handling their employees’ information.

Data Sharing and COVID-19

According to one Wired article released last week, Genomic data and data marketplaces across countries are being utilized for better understanding the virus and its unique spreading. 

NextStrain, an open-source application tracking bacteria evolution, is helping researchers release and share bacteria strains as close to 48hours after the bacteria is located.  

The article explains that NextStrain is an open-source application, and therefore allows research facilities to create their versions or use the application as a starting ground for other models of open research. 

By participating in this cross-platform data sharing, researchers “creates new opportunities to bridge the gap between public health and academia, and to enable novice users to explore the data as well.”

While this data sharing is proving helpful in moving quickly to understand and stop the growth of this virus, there are issues presented with sharing data. 

An issue with open-source data sharing, as one researcher shared with Wired, is that non-professionals can misinterpret the information, as one Twitter user published false information last week. This twitter thread not only stresses the importance of incorrect information but also how data can spread across platforms—thus emphasizing the importance of anonymizing the influx of COVID-19 patient data.

Last month, we released a short article involving genomic data and marketplaces, as well as the process of de-identifying its information. Click here to read more about what that entails. 

Crisis Communication 

Last week, we released an article about the lack of privacy in South Korea, as every detail of patients’ lives are disclosed to the public, in fear that regular people made contact with the infected individual.

As the virus moves toward Western countries, this handling of privacy must be prevented. However in unprecedented situations such as this, the “every-man-for-himself” mindset takes over for much of the public, as the concern of connection with an infected person spreads. 

One senior risk manager told Modern HealthCare, “It’s a slippery slope—if you let people know where the cases are, they may be more cautious and stay away from certain events,” she said. “If you say nothing, they get a false sense of security.” 

When looking to release information to the public or between researchers, hospitals need to ensure their data is de-identified and compliant with legislation like the Health Insurance Portability and Accountability Act (HIPAA). Not doing so leaves organizations liable to penalties ranging from $100 to $50,000 per violation.

In a newly released Advis survey, only 39% of surveyed U.S hospitals reported that they were prepared for an outbreak like COVID-19. This level of unpreparedness is where cracks in patient privacy can open up, and sensitive data is put at risk of the general public.  

COVID-19 and personal privacy 

Last month, the U.S Department of Health and Human Services released a bulletin outlining HIPAA and privacy factors in response to the outbreak. 

Highlighted in this bulletin is the minimum required disclosures of employers and workplaces as well as the implications versus necessary action of sharing patient data. This bulletin serves as a reminder to the general public of understanding the importance of privacy protection, especially in scenarios as drastic as the current situation.

Because of the panic this virus causes, the mass fear that is created has to be dealt with by authority positions properly. Employers and companies must ensure they are approaching the handling of this pandemic with consideration of patient privacy and legislation compliance. 

One U.S law firm, Sidley, created and released an elaborate list of questions companies should be reflecting on while dealing with the COVID-19 virus. In terms of privacy, some items include; 

  • What information can companies collect from third parties and open sources about employees’ and others’ health and risk of exposure?
  • Are there statutory, regulatory or contractual restrictions on any data collection, processing or dissemination contemplated to address COVID-19 risks? What are the risks of these activities?
  • Are existing privacy disclosures and international data transfer mechanisms adequate to address any new data collection and analyses?
  • Is a privacy impact assessment, or a security risk assessment, required or advisable for any new data-related activities?


The main struggle for companies right now is ensuring that their employee information is dealt with in compliance with privacy legislation, while still keeping in mind the safety of the other workers.

Join our newsletter