Enabling Data Collaboration Without Sharing DataPublished on Thu Oct 12 2023 by Dustin Van Tate Testa Data Security Breach | Blogtrepreneur on Flickr
New research has proposed a solution to a common problem in data collaboration: how to ensure that collaborating parties will benefit from sharing their datasets without revealing sensitive information. The paper, titled "Practical, Private Assurance of the Value of Collaboration," addresses this issue from the perspective of machine learning and neural networks.
The researchers propose an interactive protocol that combines fully homomorphic encryption over the Torus (TFHE) and label differential privacy. This allows parties to collaborate on their datasets while ensuring that computations are not done entirely in the encrypted domain, which can be computationally expensive.
To explain the value of this research, let's take an example. Consider two companies, P1 and P2, who develop antivirus products. P1 uses a machine learning model to label new malware programs, but the model's performance on a holdout dataset could be improved. P2 offers a solution: by combining their datasets, P1's model could be more accurate. However, before entering a formal collaborative agreement, P1 wants assurance that the collaboration will indeed be beneficial. P2, on the other hand, does not want to reveal its dataset before the agreement.
The proposed protocol provides a solution to this problem. P2 encrypts the labels of its dataset using homomorphic encryption and sends the encrypted dataset to P1. P1 combines this dataset with its own, trains the model, and tests its accuracy. The final output is then decrypted by P2. This approach improves efficiency by performing certain computations in cleartext, while ensuring privacy through differential privacy techniques.
The paper goes into detail about the protocol and presents a security analysis. The researchers also evaluate the performance and accuracy of the protocol using multiple datasets. Overall, this research provides a practical and private solution to assure the value of collaboration between parties with sensitive datasets, making data collaboration more secure and mutually beneficial.
This research has important implications for various industries where data collaboration is common, such as healthcare, finance, and cybersecurity. It allows parties to collaborate and share their datasets without compromising the privacy or security of sensitive information. By improving the accuracy of machine learning models through collaborative efforts, companies can develop more effective products and services, benefiting both the organizations involved and end-users. This innovative approach to data collaboration has the potential to revolutionize how organizations collaborate while protecting sensitive data.