Skip to content

Expand the readme #3

Merged
merged 1 commit into from Mar 10, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -3,11 +3,13 @@ The repository includes the code for metrics designed for evaluating fairness of

## Repository structure
* **data:** The folder includes data for two datasets. *Atus* is the American Time Use Survey dataset, both the derived real and synthetic data files. *Mimic* is the MIMIC-III dataset based on a past study for identifying the impact of race on mortality and includes only the synthetic dataset. Note that the synthetic datasets are generated using a Generative Adversarial Network (GAN) model called [HealthGAN](https://github.com/TheRensselaerIDEA/synthetic_data) and are intended to not release any private information of the real datasets.
- *ATUS:* The real and synthetic data are used based on the previously published paper [Medical Time-Series Data Generation Using Generative Adversarial Networks](https://link.springer.com/chapter/10.1007/978-3-030-59137-3_34).
- *MIMIC:* The synthetic data is used based on the previously published paper [Generation and evaluation of privacy preserving synthetic health data](https://www.sciencedirect.com/science/article/pii/S0925231220305117).
* **scripts:** The scripts include code snippets which are used in multiple other files or notebooks and hence, have been designed to be imported as functions.
* **notebooks:** The notebooks include code for plotting figures and calculating metrics on the datatsets.
* **results:** The results for the log disparity metric on synthetic datasets is compiled into CSV files included in this folder.

## Data files description
* **ATUS:** ATUS dataset has both the real and synthetic files available in this repository. The real data file is called *atus_train.csv* and the synthetic file is called *atus_train_synthetic_1.csv*. The real and synthetic data are used based on the previously published paper [Medical Time-Series Data Generation Using Generative Adversarial Networks](https://link.springer.com/chapter/10.1007/978-3-030-59137-3_34).
* **MIMIC:** MIMIC dataset has only the synthetic file available. The synthetic file is called *mimic_3_synthetic.csv*. The synthetic data is used based on the previously published paper [Generation and evaluation of privacy preserving synthetic health data](https://www.sciencedirect.com/science/article/pii/S0925231220305117).

## Contacts
For questions, please reach out to Karan Bhanot (bhanok@rpi.edu or bhanotkaran22@gmail.com).