MIMIC-III (Medical Information Mart for Intensive Care III) is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.

The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (both in and out of hospital).

MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors:

  • it is freely available to researchers worldwide

  • it encompasses a diverse and very large population of ICU patients

  • it contains high temporal resolution data including lab results, electronic documentation, and bedside monitor trends and waveforms.


The Philips eICU program is a transformational critical care telehealth program that delivers need-to-know information to caregivers, empowering them to care for the patients. It is a supplement — not a replacement — to the bedside team, and the data utilized by the remote care givers is archived for research purposes.

Through this work, we have generated a large database which has potential for facilitating additional research initiatives on patient outcomes, trends, and other best practice protocols in use today at most healthcare facilities. The Philips eICU Research Institute (eRI), which maintains the data, has generously contributed the eICU Collaborative Research Database described here.


The eICU Collaborative Research Database is populated with data from a combination of many critical care units throughout the continental United States. The data in the collaborative database covers patients who were admitted to critical care units in 2014 and 2015.


Identifiers are used across the database to identify unique concepts such as patients, hospitals, ICU stays, and so on. These identifiers include:

  • hospitalid - which uniquely identifies each hospital in the database.

  • uniquepid - uniquely identifies patients (i.e. it is always the same value for the same person)

  • patienthealthsystemsstayid - uniquely identifies hospitals stays

  • patientunitstayid - uniquely identifies unit stays (usually the unit is an ICU within a hospital)

Almost all tables use patientunitstayid as the primary identifier.


  • The time stamp of all events are stored as offsets from the time of ICU admission, in minutes. As a result, hospital admission time will in general be negative.

  • It may help to add a pre-ICU admission “fuzz” because sometimes laboratory measurements are measured pre-ICU, e.g., look at all the labs measured from (-6*24) minutes to (24*60) minutes from ICU admission.

Data interfaces

Data from each patient is collected into a common warehouse only if certain “interfaces” are available. Each interface is used to transform and load a certain type of data: vital sign interfaces incorporate vital signs, laboratory interfaces provide measurements on blood samples, and so on. It is important to be aware that different care units may have different interfaces in place, and that the lack of an interface will result in no data being available for a given patient, even if those measurements were made in reality.

Inputs and Outputs

  • The medication table is essentially an interface to pharmacy data - i.e. prescribed medications.

  • The intakeoutput and infusiondrug tables should be used for fluids and drugs, respectively. It is a challenge to decide whether a hospital is actually collecting and archiving data in the infusiondrug table.

Laboratory tests

The lab table is populated by ~160 “standard” laboratory measurements. When a hospital first participates in the eICU program, they must map these values to their local system. As a result, most common labs are well harmonized in this table. However, it is possible for the lab interface to be down and for standard labs to be recorded in the customlab table (e.g. during software upgrades). These downtimes are in principle rare, but an empirical study on how frequently this occurs is yet to be undertaken.


The General Internal Medicine (GIM) dataset is comprised of de-identified health related data associated with over 22,000 patient encounters for 14,000 unique patients who were admitted under the GIM service at St. Michael’s Hospital between 2011 and 2019. The dataset includes:

Numeric values

  • 9 vital signs

  • 100 labs

  • 7 shift assessment variables 

  • 7 intake-outtake variables

  • 1 ulcer variable, 1 alcohol scale, 1 diabetes variable

Clinical orders (165)

  • Imaging, Telemetry, Consults, cardio, diet, respiration, activities, codes, protocols, transfusions, wound care, neurology

Medication administrations

  • Medications are grouped into medication AHFS classes

The dataset will be provided in both a preprocessed format and as raw data tables. The following patient outcomes will also be available at the patient encounter level: 

  1. ICU transfer - Patient transfers from GIM to ICU.

  2. Death - Patient dies while on GIM.

  3. Palliative entry - Patient gets transferred to palliative care.

  4. Palliative discharge - Patient gets discharge to a palliative care unit

    Discharged - Patient is discharged from hospital

  5. ICD diagnosis

  6. Sepsis (sepsis defined based on SIRS criteria)

  7. Respiratory failure (respiratory failure based on ventilator order)


The Medical Imaging in Cervical Spine Trauma (MICST) dataset is comprised of 4250 de-identified Computed Tomography (CT) scans of the cervical spine from patients who were assessed in the Emergency Department of St. Michael’s Hospital between May 1, 2004 and January 31, 2019 for cervical spine trauma. 

Each entry in the dataset consists of a CT scan in the form of a series of axial PNG images with an associated series of masks containing voxel-level annotation of fractures. The dataset consists of 765 CT scans containing at least one fracture and 3485 CT scans without a fracture. The corresponding patient age and gender will be provided. 

The dataset is notable for the following reasons:

  • It is the largest freely-available annotated dataset of de-identified CT scans of the cervical spine 

  • CT scans were independently reviewed and fractures annotated by 3 radiologists. The annotations generated by each radiologist were compiled to generate masks with precise voxel-level annotations of each fracture. These masks were then validated by a fellowship trained neuroradiologist.

The MICST dataset supports the work of the Diagnostic Imaging and Learning Algorithms @ LKS-CHART (http://dila.ai) team in the development of advanced diagnostic and clinical decision-making tools.