Skip to content
Permalink
Browse files
Merge pull request #3 from RensselaerIDEA/readme
Expand the readme
  • Loading branch information
erickj4 committed Mar 10, 2022
2 parents c254fbd + a27904a commit 04b66842dfbaf6ed1dafdb5153fc891173d7c45a
Showing 1 changed file with 4 additions and 2 deletions.
@@ -3,11 +3,13 @@ The repository includes the code for metrics designed for evaluating fairness of

## Repository structure
* **data:** The folder includes data for two datasets. *Atus* is the American Time Use Survey dataset, both the derived real and synthetic data files. *Mimic* is the MIMIC-III dataset based on a past study for identifying the impact of race on mortality and includes only the synthetic dataset. Note that the synthetic datasets are generated using a Generative Adversarial Network (GAN) model called [HealthGAN](https://github.com/TheRensselaerIDEA/synthetic_data) and are intended to not release any private information of the real datasets.
- *ATUS:* The real and synthetic data are used based on the previously published paper [Medical Time-Series Data Generation Using Generative Adversarial Networks](https://link.springer.com/chapter/10.1007/978-3-030-59137-3_34).
- *MIMIC:* The synthetic data is used based on the previously published paper [Generation and evaluation of privacy preserving synthetic health data](https://www.sciencedirect.com/science/article/pii/S0925231220305117).
* **scripts:** The scripts include code snippets which are used in multiple other files or notebooks and hence, have been designed to be imported as functions.
* **notebooks:** The notebooks include code for plotting figures and calculating metrics on the datatsets.
* **results:** The results for the log disparity metric on synthetic datasets is compiled into CSV files included in this folder.

## Data files description
* **ATUS:** ATUS dataset has both the real and synthetic files available in this repository. The real data file is called *atus_train.csv* and the synthetic file is called *atus_train_synthetic_1.csv*. The real and synthetic data are used based on the previously published paper [Medical Time-Series Data Generation Using Generative Adversarial Networks](https://link.springer.com/chapter/10.1007/978-3-030-59137-3_34).
* **MIMIC:** MIMIC dataset has only the synthetic file available. The synthetic file is called *mimic_3_synthetic.csv*. The synthetic data is used based on the previously published paper [Generation and evaluation of privacy preserving synthetic health data](https://www.sciencedirect.com/science/article/pii/S0925231220305117).

## Contacts
For questions, please reach out to Karan Bhanot (bhanok@rpi.edu or bhanotkaran22@gmail.com).

0 comments on commit 04b6684

Please sign in to comment.