Datasets
- dnamite.datasets.fetch_mimic(data_path, return_benchmark_data=False)
Fetch the MIMIC-III dataset from the raw MIMIC-III csv files.
Data is prepared using the following steps:
1. For each patient, we define the index time as 48 hours after their first admission and construct features exclusively from data recorded before this index time.
2. Continuous and categorical features are collected from labevents and chartevents using all available data recorded before each patient’s index time.
3. Continuous features are averaged over the pre-index window, while categorical features are assigned the most recent value before the index time.
4. Features are deemed eligible if they are observed in at least 5% of patients.
The resulting dataset contains 880 features.
- Parameters:
data_path (str) – The path to the directory containing the MIMIC-III csv files.
return_benchmark_data (bool, default=False) – If True, the function will return data with one feature for each feature in the MIMIC-III benchmark data, see https://arxiv.org/pdf/1703.07771, along with the full dataset. If False, the function will return only, the full dataset.
- Returns:
The cohort dataset if return_benchmark_data is False, else (cohort dataset, benchmark dataset).
- Return type:
pandas.DataFrame or tuple