diff --git a/README.md b/README.md index 83eebf1..5ee1bd2 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,13 @@ The repository includes the code for metrics designed for evaluating fairness of ## Repository structure * **data:** The folder includes data for two datasets. *Atus* is the American Time Use Survey dataset, both the derived real and synthetic data files. *Mimic* is the MIMIC-III dataset based on a past study for identifying the impact of race on mortality and includes only the synthetic dataset. Note that the synthetic datasets are generated using a Generative Adversarial Network (GAN) model called [HealthGAN](https://github.com/TheRensselaerIDEA/synthetic_data) and are intended to not release any private information of the real datasets. - - *ATUS:* The real and synthetic data are used based on the previously published paper [Medical Time-Series Data Generation Using Generative Adversarial Networks](https://link.springer.com/chapter/10.1007/978-3-030-59137-3_34). - - *MIMIC:* The synthetic data is used based on the previously published paper [Generation and evaluation of privacy preserving synthetic health data](https://www.sciencedirect.com/science/article/pii/S0925231220305117). * **scripts:** The scripts include code snippets which are used in multiple other files or notebooks and hence, have been designed to be imported as functions. * **notebooks:** The notebooks include code for plotting figures and calculating metrics on the datatsets. * **results:** The results for the log disparity metric on synthetic datasets is compiled into CSV files included in this folder. +## Data files description +* **ATUS:** ATUS dataset has both the real and synthetic files available in this repository. The real data file is called *atus_train.csv* and the synthetic file is called *atus_train_synthetic_1.csv*. The real and synthetic data are used based on the previously published paper [Medical Time-Series Data Generation Using Generative Adversarial Networks](https://link.springer.com/chapter/10.1007/978-3-030-59137-3_34). +* **MIMIC:** MIMIC dataset has only the synthetic file available. The synthetic file is called *mimic_3_synthetic.csv*. The synthetic data is used based on the previously published paper [Generation and evaluation of privacy preserving synthetic health data](https://www.sciencedirect.com/science/article/pii/S0925231220305117). + ## Contacts For questions, please reach out to Karan Bhanot (bhanok@rpi.edu or bhanotkaran22@gmail.com).