synthetic-data

IDEA Synthetic Data Project

This repository contains the code for the paper "Differentially Private Synthetic Health Data" by Joseph Pedersen and Kristin P. Bennett.

The following files are located in the repository

dp_wgan_gp.py is a script which implements a differentially private Wasserstein GAN with gradient penalty as described in the paper as Algorithm 1. It also generates samples of synthetic data from the trained generator.
np_wgan_bds.py and tf_wgan_bds.py are scripts that contain functions to compute the bounds on the gradient of a WGAN, and bounds on the norm of the gradient of the gradient penalty term of a WGAN-GP, based on the architecture, the current value of the weights/biases, and bounds on the input. They also contain functions to compute the gradients for an actual input, as opposed to bounds on the gradient. They also contain functions to setup random WGAN-GP networks and test that the gradients are within the computed bounds. One of the versions uses TensorFlow, and the other uses NumPy.
np_wgan_bds_test.py and tf_wgan_bds_test.py are scripts that run max_iters tests of np_wgan_bds.py and tf_wgan_bds.py, respectively. Each iteration, a WGAN-GP is setup with a random number of nodes per layer, and the weights/biases are initialized to random values, then the gradient computed by TensorFlow is checked to make sure that it is within the computed bounds. Our tests have shown that the bounds hold when given an appropriate tolerance on the bounds of the inputs and bound on the gradient of the output function, as mentioned in the paper.
nnAA.py contains a function to compute the nearest neighbor adversarial accuracy defined in the paper.

RensselaerIDEA/synthetic-data