Risk Stratification¶
Separating a patient population into high-risk and low-risk of having an outcome
- Predicting something in the future
- Goal is different from diagnosis, with distinct performance metrics
- Fuzzy classification
- Diverse data
Risk stratification drives interventions that target high-risk patients
Goal is to reduce cost and improve patient outcomes
Applications¶
- Pre-mature infant’s risk of severe morbidity
- Does this patient need to be admitted to coronary-care unit
- Likelihood of hospital readmission
Types¶
Traditional | AI | |
---|---|---|
Use readily-available data and feed into model | ||
Pros | Simple | - Population-level - Automated: Fits more easily into workflow - Higher accuracy - Quicker to derive |
Limitations | Manual Sample-specific Not used as much as required due to high friction | |
Example | APGAR Scoring system | AI |
APGAR Scoring system¶
Framing for Supervised ML¶
Why are gaps important? To avoid label leakage
Sparsity-encourage models
- Easier to interpret
- Helps deploy model to different clinics where they may not have access to all the data
How to get labels¶
- Manual
- Label patients’ data by “chart review”
- Visualization of individual patient data time series
- Automatic
- Rule-based
- Labels may get revised regularly based on standards
- For eg
- 2020: 200 units of sugar = diabetes
- 2025: 100 units of sugar = diabetes
- Machine learning to predict if the patient is “currently” diabetic
Based on
- medications: may not have record of
- purchase
- intake
- lab data
Metrics¶
PPV Positive Predictive Value | ||
AUC-ROC | ||
Calibration |
Intervention-Tainted Outcomes¶
Form of Self-Selection Bias
Let
- Group A: Patients with Pneumonia with history of asthma
- Group B: Patients with Pneumonia without history of asthma
Observation: Group A dies less often than group B
Discussion¶
- Reason group A dies less is due to more intensive care
- Long survival time may be due to treatment
Conclusion¶
- Does this mean group A has lower risk? No
- Should we treat group A with less priority? No
Hacks¶
- Remove such features from the model; not feasible for high-dimensional data
- Redefine outcome by finding a pre-treatment surrogate (such as lactate levels)
- Consider treated patients as right-censored by treatment
Solutions¶
- Interpretable models are very important
- Causality modelling: Reframe question to “Will admission to ICU lower likelihood of death for patient”
Deep Learning for Risk Stratification¶
Not very big gains
Baseline is L1-regularized Logistic Regression, with good structural features
Sequential data in medicine is very different from language modelling
- Many time scales
- Significant missing data
- Multi-variate observations
- Not enough data to learn subtle non-linear interactions