Data masking is one of the primary reasons to land on our website. But, our answer to this query is always the same: whether you’re looking for “data masking,” “data masking tools,” or “data masking security solutions,” – you’re looking for the wrong thing.
Data masking is the redaction of identifiers within a dataset to prevent exposure. Have someone’s name or social security number? Mask the column to blank out those identifiers, and avoid anyone figuring out who they are.
The problem with this approach is that it does not solve the needs businesses have and fails to understand how data works today. Even if direct identifiers like someone’s name are masked, the indirect identifiers remain exposed. These can be used in combination to re-identify individuals in a dataset. Google and the University of Chicago, for example, are facing a multi-million-dollar problem because they did not recognize this outcome.
Companies need to privacy-protect all of the relevant identifiers in their dataset, but do so in such a way that the analytical value of the data is preserved, so they can perform data science. Data masking would mean masking the entire dataset, leaving it useless for analysis. There’s no point in collecting data if you don’t use it. And if you don’t use data, your company will fall behind in innovation, and risk being disrupted.
In order to privacy-protect datasets for enterprise-use in data science environments, businesses must find a solution that addresses four significant needs:
1) Systematic metadata classification
Enterprises need software that can autonomously identify direct, indirect, sensitive, or insensitive identifiers that must be protected. It must be able to do this in a systematic way that is consistent across the entire organization.
2) Autonomous risk-scoring, assessment, and precision trade-off controls
Some datasets can be superficially compliant, but still hold the potential for re-identification (the Google and University of Chicago case again springs to mind). Without demonstrating any method to estimate risk, companies will fail most compliance tests. Enterprises must be able to assess the level of risk-exposure present within a dataset, evaluate the loss of analytical value for data science purposes, and precisely control the trade-off between the two.
3) Advanced mathematical protection techniques
Data masking redacts identifiers with essential analytical utility for data science. Cutting edge cryptography techniques that protect data and preserve value exist in the academic world. Chief among them is differential privacy. Though most solution providers on the commercial market today do not use differential privacy, effective enterprise-grade software cannot miss out on it.
4) An easy-to-use and intuitive interface
Large enterprise companies have teams and stakeholders across the organization that have contrasting needs, serve different functions, and have varying levels of technical ability. In order to implement an effective solution that solves real challenges, companies must maintain ease-of-use as one of their core priorities when evaluating privacy solutions.
These four needs are the core components of an effective privacy solution that will lead your company to success by protecting your data and ensuring you can pursue meaningful data science outcomes.
It was to satisfy these four needs that we built CN-Protect. If you’re interested in learning more, check out the free version of our app, and begin unlocking insights on privacy-protected data right away.