As the data privacy discourse continues to grow, the terms used to explain data science, data privacy and data protection must be accessible to everyone. That’s why we at CryptoNumerics have compiled a continuously growing Privacy Glossary, to help people learn and better understand what’s happening to their data.
Below are 25 terms surrounding privacy legislations, personal data, and other privacy terms to help you become familiar with them.
- General Data Protection Regulation (GDPR) is a privacy regulation implemented in May 2018 that has inspired more regulations worldwide. The law determined data controllers must establish a specific legal basis for each and every purpose where personal data is used. If a business intends to use customer data for an additional purpose, then it must first obtain explicit consent from the individual. As a result, all data in data lakes can only be made available for use after processes have been implemented to notify and request permission from every subject for every use case.
- California Consumer Privacy Act (CCPA) is a sweeping piece of legislation that is aimed at protecting the personal information of California residents. It will give consumers the right to learn about the personal information that businesses collect, sell, or disclose about them, and prevent the sale or disclosure of their personal information. It includes the Right to Know, Right of Access, Right to Portability, Right to Deletion, Right to be Informed, Right to Opt-Out, and Non-Discrimination Based on Exercise of Rights. This means that if consumers do not like the way businesses are using their data, they request for it to be deleted -a risk for business insights
- Health Insurance Portability and Accountability Act (HIPAA) is a health protection regulation passed in 1998 by President Clinton. This act gives patients the right to privacy and covers 18 personal identifiers that are required to be de-identified. This Act is applicable not only in hospitals but in places of work, schooling, etc.
Legislative Definitions of Personal Information
- Personal Data (GDPR): Any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person’ (source)
- Personal Information (PI) (CCPA): “information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.” (source)
- Personal Health Information (PHI) (HIPAA): considered to be any identifiable health information that is used, maintained, stored, or transmitted by a HIPAA-covered entity – A healthcare provider, health plan or health insurer, or a healthcare clearinghouse – or a business associate of a HIPAA-covered entity, in relation to the provision of healthcare or payment for healthcare services. PHI is made up of 18 identifiers, including names, social security number, and medical record numbers (source)
- Anonymization is a process where personally identifiable information (whether direct or indirect) from data sets is removed or manipulated to prevent re-identification. This process must be made irreversible.
- Data controller is a person, an authority or a body that determines the purposes for which and the means by which personal data is collected.
- Data lake is a collection point for the data a business collects.
- Data processor is a person, an authority or a body that processes personal data on behalf of the controller.
- De-identified data is the result of removing or manipulating direct and indirect identifiers to break any links so that re-identification is impossible.
- Differential privacy is a privacy framework that characterizes a data analysis or transformation algorithm rather than a dataset. It specifies a property that the algorithm must satisfy to protect the privacy of its inputs, whereby the outputs of the algorithm are statistically indistinguishable when any one particular record is removed in the input dataset.
- Direct identifiers are pieces of data that identify an individual without the need for more data, ex. name, SSN, etc.
- Homomorphic encryption is a method of performing a calculation on encrypted information (ciphertext) without decrypting it (to plaintext) first.
- Identifier: Unique information that identifies a specific individual in a dataset. Examples of identifiers are names, social security numbers, and bank account numbers. Also, any field that is unique for each row.
- Indirect identifiers are pieces of data that can be used to identify an individual indirectly, or with the combination of other pieces of information, ex. date of birth, gender, etc.
- Insensitive: Information that is not identifying or quasi-identifying and that you do not want to be transformed.
- k-anonymity is where identifiable attributes of any record in a particular database are indistinguishable from at least one other record.
- Perturbation: Data can be perturbed by using additive noise, multiplicative noise, data swapping (changing the order of the data to prevent linkage) or generating synthetic data.
- Pseudonymization is the processing of personal data in a way that the personal data can no longer be attributed to a specific data subject without the use of additional information. This is provided that such additional information is kept separately and is subject to technical and organizational
- Quasi-identifiers (also known as Indirect identifiers) are pieces of information that on its own are not sufficient to identify a specific individual but when combined with other quasi-identifiers is possible to re-identify an individual. Examples of quasi-identifiers are zip code, age, nationality, and gender.
- Re-identification, or de-anonymization, is when anonymized data (de-identified data) is matched with publicly available information, or auxiliary data, in order to discover the individual to which the data belong to.
- Secure multi-party computation (SMC), or Multi-Party Computation (MPC), is an approach to jointly compute a function over inputs held by multiple parties while keeping those inputs private. MPC is used across a network of computers while ensuring that no data leaks during computation. Each computer in the network only sees bits of secret shares — but never anything meaningful.
- Sensitive: Information that is more general among the population, making it difficult to identify an individual with it. However, when combined with quasi-identifiers, sensitive information can be used for attribute disclosure. Examples of sensitive information are salary and medical data. Let’s say we have a set of quasi-identifiers that form a group of women aged 40-50, a sensitive attribute could be “diagnosed with breast cancer.” Without the quasi-identifiers, the probability of identifying who has breast cancer is low, but once combined with the quasi-identifiers, the probability is high.
- Siloed data is data stored away in silos with limited access, to protect it against the risk of exposing private information. While these silos protect the data to a certain extent, they also lock the value of the data.