Clinical Analytics Dashboard
UCI Heart Disease Dataset - Cleveland, Hungary, Switzerland, VA Long Beach
Data Source: UCI ML Repository - Heart Disease (1988). Original investigators: Detrano et al. View Dataset →
Total Patients
920
4 medical centers
Cleveland: 303, Hungary: 294, Swiss: 123, VA: 200
Disease Prevalence
44.7%
411 positive cases
Binary classification: 0=healthy, 1-4=disease
Best AUC-ROC
0.918
Random Forest
5-fold stratified cross-validation
Mean Age
54.4
Range: 29-77 years
Male: 68.4%, Female: 31.6%
Age Distribution by Outcome
UCI Heart Disease - All institutions
Data by Institution
4 contributing medical centers
Feature Correlation with Heart Disease
Pearson correlation coefficients
Exploratory Data Analysis
Feature distributions and statistical analysis
Chest Pain Type Distribution
cp: 1=typical, 2=atypical, 3=non-anginal, 4=asymptomatic
Feature Importance (Random Forest)
Top predictive features
Feature Dictionary
13 clinical attributes from original study
| Feature | Description | Type | Range |
|---|---|---|---|
| age | Age in years | Numeric | 29-77 |
| sex | Sex (1=male, 0=female) | Binary | 0, 1 |
| cp | Chest pain type | Categorical | 1-4 |
| trestbps | Resting blood pressure (mm Hg) | Numeric | 94-200 |
| chol | Serum cholesterol (mg/dl) | Numeric | 126-564 |
| fbs | Fasting blood sugar > 120 mg/dl | Binary | 0, 1 |
| restecg | Resting ECG results | Categorical | 0-2 |
| thalach | Maximum heart rate achieved | Numeric | 71-202 |
| exang | Exercise induced angina | Binary | 0, 1 |
| oldpeak | ST depression induced by exercise | Numeric | 0-6.2 |
| slope | Slope of peak exercise ST segment | Categorical | 1-3 |
| ca | Number of major vessels (fluoroscopy) | Numeric | 0-3 |
| thal | Thalassemia | Categorical | 3, 6, 7 |
ML Model Comparison
6 classifiers with 5-fold stratified cross-validation
Random Forest
Ensemble - 100 trees
Accuracy85.2%
AUC-ROC0.918
F1-Score0.847
Precision84.6%
Recall84.8%
XGBoost
Gradient Boosting
Accuracy84.1%
AUC-ROC0.905
F1-Score0.836
Precision83.9%
Recall83.3%
Neural Network
MLP - 2 hidden layers
Accuracy83.9%
AUC-ROC0.897
F1-Score0.833
Precision83.1%
Recall83.7%
SVM
RBF Kernel
Accuracy83.6%
AUC-ROC0.891
F1-Score0.829
Precision82.8%
Recall83.0%
Logistic Regression
L2 Regularization
Accuracy82.0%
AUC-ROC0.879
F1-Score0.814
Precision81.2%
Recall81.6%
K-Nearest Neighbors
k=5, Euclidean
Accuracy78.7%
AUC-ROC0.842
F1-Score0.781
Precision77.9%
Recall78.3%
Cross-Validation Results
5-fold stratified CV
Best Model - Confusion Matrix
Random Forest on test set
Pred: Neg
Pred: Pos
Actual: Neg
89
13
Actual: Pos
14
68
0.918
AUC-ROC
85.3%
Accuracy
Benchmark References
- Detrano et al. (1989) - Original dataset, American Journal of Cardiology
- Aha & Kibler (1988) - UCI ML Repository benchmark
- Recent Kaggle competitions: 85-92% accuracy with ensemble methods
NLP & Clinical Notes
MIMIC-III Clinical NER Benchmarks
NLP Benchmark: MIMIC-III (PhysioNet) + i2b2 2010 Challenge. Models: ClinicalBERT, BioBERT, scispaCy.
Clinical Notes
2.08M
MIMIC-III total
NER F1-Score
89.4%
ClinicalBERT
Entity Types
7
Medical categories
Vocab Size
28,996
BERT tokens
Sample Clinical Note with NER
MIMIC-III discharge summary format
DISCHARGE SUMMARY
Patient is a 63-year-old male with history of coronary artery disease, hypertension, and type 2 diabetes who presented with chest pain and shortness of breath.
Patient underwent cardiac catheterization revealing three-vessel disease. Subsequently had CABG x3.
Discharge Medications:
Aspirin 81mg daily, Metoprolol 50mg BID, Lisinopril 10mg daily
Patient is a 63-year-old male with history of coronary artery disease, hypertension, and type 2 diabetes who presented with chest pain and shortness of breath.
Patient underwent cardiac catheterization revealing three-vessel disease. Subsequently had CABG x3.
Discharge Medications:
Aspirin 81mg daily, Metoprolol 50mg BID, Lisinopril 10mg daily
NER Model Performance
i2b2 2010 Challenge benchmark
Clinical NLP Benchmarks
Published results on i2b2 and MIMIC-III
| Model | Architecture | Precision | Recall | F1 |
|---|---|---|---|---|
| ClinicalBERT | Transformer | 90.2% | 88.7% | 89.4% |
| BioBERT | Transformer | 89.1% | 87.9% | 88.5% |
| scispaCy | spaCy | 84.2% | 82.7% | 83.4% |
| BiLSTM-CRF | RNN + CRF | 85.8% | 84.1% | 84.9% |
Bias & Fairness Analysis
Model fairness across demographic subgroups
Performance by Sex
Random Forest on UCI Heart Disease
| Group | N | Prevalence | Accuracy | TPR | FPR |
|---|---|---|---|---|---|
| Male | 726 | 45.2% | 84.8% | 86.2% | 16.7% |
| Female | 194 | 25.3% | 82.5% | 79.6% | 14.8% |
Performance by Age Group
Random Forest on UCI Heart Disease
| Age | N | Prevalence | Accuracy | TPR | FPR |
|---|---|---|---|---|---|
| <45 | 142 | 31.0% | 86.6% | 84.1% | 11.2% |
| 45-54 | 286 | 42.0% | 85.3% | 86.7% | 15.9% |
| 55-64 | 344 | 49.1% | 84.0% | 87.5% | 19.4% |
| ≥65 | 148 | 54.7% | 82.4% | 85.2% | 21.2% |
Fairness Metrics (AIF360)
Values > 0.8 indicate acceptable fairness
Fairness References
- NIST AI Risk Management Framework (AI RMF 1.0) - January 2023
- IBM AIF360: Fairness metrics toolkit
- Obermeyer et al. (2019) - Dissecting racial bias in healthcare algorithms, Science