Privacy-Preserving for Data Science Technology

Secure Multi-Party Computation

Secure Multi-party Computation (SMC), or Multi-Party Computation (MPC), is an approach to jointly compute a function over inputs held by multiple parties while keeping those inputs private.

MPC is used across a network of computers while ensuring that no data leaks during computation. Each computer in the network only sees bits of secret shares — but never anything meaningful. Secret shares are derived from data using correlated randomness such that at the end of the computation, each computer has a share of the solution. The only way to reconstruct the complete solution is to add all the shares together from all the computers involved. 

This allows training a machine learning model across datasets held by multiple parties as if they were a single dataset but without actually moving, centralizing, or disclosing the data between the parties.  The secret shares exchanged between parties cannot be used to reverse engineer any input data, and no single party can unilaterally decrypt the resulting model. Therefore, MPC can be used to satisfy privacy, confidentiality, and data residency requirements. 

    Secure Multi-Party Computation

    Secure Multi-party Computation (SMC), or Multi-Party Computation (MPC), is an approach to jointly compute a function over inputs held by multiple parties while keeping those inputs private.

    MPC is used across a network of computers while ensuring that no data leaks during computation. Each computer in the network only sees bits of secret shares — but never anything meaningful. Secret shares are derived from data using correlated randomness such that at the end of the computation, each computer has a share of the solution. The only way to reconstruct the complete solution is to add all the shares together from all the computers involved. 

    This allows training a machine learning model across datasets held by multiple parties as if they were a single dataset but without actually moving, centralizing, or disclosing the data between the parties.  The secret shares exchanged between parties cannot be used to reverse engineer any input data, and no single party can unilaterally decrypt the resulting model. Therefore, MPC can be used to satisfy privacy, confidentiality, and data residency requirements. 

      Private Set Intersection

        Private Set Intersection (PSI) identifies common elements between datasets typically held by different parties, without revealing anything to each other except the intersection.  This replaces simplistic approaches such as one-way hashing functions that are susceptible to dictionary attacks. Applications for PSI include identifying the overlap with potential data partners (i.e. “Is there a large enough client base in common to be worthwhile to work together), as well as aligning datasets with data partners in preparation for using MPC to train a machine learning model. 

          Building and Training Models

          CN-Insight uses secure multiparty computation to build statistical and ML models without exposing or relocating the data.  Each party who wishes to contribute to the collaborative analytics has an input either data or a model. The parties do not want to expose or move their inputs.  Secure multiparty computation solves this by allowing parties to jointly compute a function while keeping their inputs private. The function is either the trained model when all parties have data as inputs, or model inference, when one party has a model and the others have data as inputs.

          The first step is to install CN-Insight in each party’s desired location either on premises, in the cloud or hybrid cloud.  The second step is to create encrypted connections with the other parties for CN-Insight to communicate over. Third CN-Insight must know where the raw data is so that it can ingest it and secret split into different shares.  All shares are necessary to retrieve the original data and it is impossible to obtain the original data without all shares. Secret splitting is based on one-time pad encryption and offers absolute security to create the individual shares.

          Once the secret shares are created CN-Insight will send a share to each of the other party’s instances while holding one locally.  In this manner, all parties are necessary to reveal the original data. Now when CN-Insight needs to evaluate a specific function each party’s instance of CN-Insight will operate on all the shares that it has and each instance will end up with a share of the result.  In order to reveal the result to all instances, all shares must be received by all, to reveal the result to a subset all shares are only sent to that subset.

          Training models involves a set of mathematical operations on data.  Those mathematical operations are evaluated in CN-Insight using the approach above.  The data is protected at all times and the result is only revealed once the algorithm is complete and only to the agreed upon recipients.

          Building and Training Models

          CN-Insight uses secure multiparty computation to build statistical and ML models without exposing or relocating the data.  Each party who wishes to contribute to the collaborative analytics has an input either data or a model. The parties do not want to expose or move their inputs.  Secure multiparty computation solves this by allowing parties to jointly compute a function while keeping their inputs private. The function is either the trained model when all parties have data as inputs, or model inference, when one party has a model and the others have data as inputs.

          The first step is to install CN-Insight in each party’s desired location either on premises, in the cloud or hybrid cloud.  The second step is to create encrypted connections with the other parties for CN-Insight to communicate over. Third CN-Insight must know where the raw data is so that it can ingest it and secret split into different shares.  All shares are necessary to retrieve the original data and it is impossible to obtain the original data without all shares. Secret splitting is based on one-time pad encryption and offers absolute security to create the individual shares.

          Once the secret shares are created CN-Insight will send a share to each of the other party’s instances while holding one locally.  In this manner, all parties are necessary to reveal the original data. Now when CN-Insight needs to evaluate a specific function each party’s instance of CN-Insight will operate on all the shares that it has and each instance will end up with a share of the result.  In order to reveal the result to all instances, all shares must be received by all, to reveal the result to a subset all shares are only sent to that subset.

          Training models involves a set of mathematical operations on data.  Those mathematical operations are evaluated in CN-Insight using the approach above.  The data is protected at all times and the result is only revealed once the algorithm is complete and only to the agreed upon recipients.

          MPC-enabled Feature Engineering

            Feature engineering is the process of selecting features (or columns) from input datasets, or constructing new features from existing data, to be used for machine learning training to create models.  It is an essential part of the machine learning pipeline as it determines which features contain valuable information that lead to a model with high prediction accuracy, It is also one of the major remaining “human elements” present in the workflow due to relying on the data scientists’ domain expertise to extract valuable information from the available data. 

            In preparation for using MPC-based machine learning across multiple party’s datasets as if it were a single dataset, CryptoNumerics enables MPC-based feature engineering in a similar manner, keeping each party’s dataset private and confidential. Metadata that guides feature engineering can be securely shared without revealing any sensitive data. This facilitates a deeper understanding of the relationships between features, and their relative importance. Additionally, through MPC, users can combine relevant features and create new ones, boosting model accuracy. 

              Get In Touch