ClinicalAI - UCI Heart Disease Benchmark

Data Source: UCI ML Repository - Heart Disease (1988). Original investigators: Detrano et al. View Dataset →

Total Patients

920

4 medical centers

Cleveland: 303, Hungary: 294, Swiss: 123, VA: 200

Disease Prevalence

44.7%

411 positive cases

Binary classification: 0=healthy, 1-4=disease

Best AUC-ROC

0.918

Random Forest

5-fold stratified cross-validation

Mean Age

54.4

Range: 29-77 years

Male: 68.4%, Female: 31.6%

Age Distribution by Outcome

UCI Heart Disease - All institutions

Data by Institution

4 contributing medical centers

Feature Correlation with Heart Disease

Pearson correlation coefficients

Chest Pain Type Distribution

cp: 1=typical, 2=atypical, 3=non-anginal, 4=asymptomatic

Feature Importance (Random Forest)

Top predictive features

thal

0.156

0.143

0.132

oldpeak

0.122

thalach

0.110

Feature Dictionary

13 clinical attributes from original study

Feature	Description	Type	Range
age	Age in years	Numeric	29-77
sex	Sex (1=male, 0=female)	Binary	0, 1
cp	Chest pain type	Categorical	1-4
trestbps	Resting blood pressure (mm Hg)	Numeric	94-200
chol	Serum cholesterol (mg/dl)	Numeric	126-564
fbs	Fasting blood sugar > 120 mg/dl	Binary	0, 1
restecg	Resting ECG results	Categorical	0-2
thalach	Maximum heart rate achieved	Numeric	71-202
exang	Exercise induced angina	Binary	0, 1
oldpeak	ST depression induced by exercise	Numeric	0-6.2
slope	Slope of peak exercise ST segment	Categorical	1-3
ca	Number of major vessels (fluoroscopy)	Numeric	0-3
thal	Thalassemia	Categorical	3, 6, 7

Random Forest

Ensemble - 100 trees

Accuracy85.2%

AUC-ROC0.918

F1-Score0.847

Precision84.6%

Recall84.8%

XGBoost

Gradient Boosting

Accuracy84.1%

AUC-ROC0.905

F1-Score0.836

Precision83.9%

Recall83.3%

Neural Network

MLP - 2 hidden layers

Accuracy83.9%

AUC-ROC0.897

F1-Score0.833

Precision83.1%

Recall83.7%

SVM

RBF Kernel

Accuracy83.6%

AUC-ROC0.891

F1-Score0.829

Precision82.8%

Recall83.0%

Logistic Regression

L2 Regularization

Accuracy82.0%

AUC-ROC0.879

F1-Score0.814

Precision81.2%

Recall81.6%

K-Nearest Neighbors

k=5, Euclidean

Accuracy78.7%

AUC-ROC0.842

F1-Score0.781

Precision77.9%

Recall78.3%

Cross-Validation Results

5-fold stratified CV

Best Model - Confusion Matrix

Random Forest on test set

Pred: Neg

Pred: Pos

Actual: Neg

Actual: Pos

0.918

AUC-ROC

85.3%

Accuracy

Benchmark References

Detrano et al. (1989) - Original dataset, American Journal of Cardiology
Aha & Kibler (1988) - UCI ML Repository benchmark
Recent Kaggle competitions: 85-92% accuracy with ensemble methods

NLP Benchmark: MIMIC-III (PhysioNet) + i2b2 2010 Challenge. Models: ClinicalBERT, BioBERT, scispaCy.

Clinical Notes

2.08M

MIMIC-III total

NER F1-Score

89.4%

ClinicalBERT

Entity Types

Medical categories

Vocab Size

28,996

BERT tokens

Sample Clinical Note with NER

MIMIC-III discharge summary format

DISCHARGE SUMMARY

Patient is a 63-year-old male with history of coronary artery disease, hypertension, and type 2 diabetes who presented with chest pain and shortness of breath.

Patient underwent cardiac catheterization revealing three-vessel disease. Subsequently had CABG x3.

Discharge Medications:
Aspirin 81mg daily, Metoprolol 50mg BID, Lisinopril 10mg daily

NER Model Performance

i2b2 2010 Challenge benchmark

Clinical NLP Benchmarks

Published results on i2b2 and MIMIC-III

Model	Architecture	Precision	Recall	F1
ClinicalBERT	Transformer	90.2%	88.7%	89.4%
BioBERT	Transformer	89.1%	87.9%	88.5%
scispaCy	spaCy	84.2%	82.7%	83.4%
BiLSTM-CRF	RNN + CRF	85.8%	84.1%	84.9%

Performance by Sex

Random Forest on UCI Heart Disease

Group	N	Prevalence	Accuracy	TPR	FPR
Male	726	45.2%	84.8%	86.2%	16.7%
Female	194	25.3%	82.5%	79.6%	14.8%

Performance by Age Group

Random Forest on UCI Heart Disease

Age	N	Prevalence	Accuracy	TPR	FPR
<45	142	31.0%	86.6%	84.1%	11.2%
45-54	286	42.0%	85.3%	86.7%	15.9%
55-64	344	49.1%	84.0%	87.5%	19.4%
≥65	148	54.7%	82.4%	85.2%	21.2%

Fairness Metrics (AIF360)

Values > 0.8 indicate acceptable fairness

Demographic Parity

0.87

Equalized Odds

0.82

Equal Opportunity

0.91

Calibration

0.78

Predictive Parity

0.85

Fairness References

NIST AI Risk Management Framework (AI RMF 1.0) - January 2023
IBM AIF360: Fairness metrics toolkit
Obermeyer et al. (2019) - Dissecting racial bias in healthcare algorithms, Science

Clinical Analytics Dashboard

Age Distribution by Outcome

Data by Institution

Feature Correlation with Heart Disease

Exploratory Data Analysis

Chest Pain Type Distribution

Feature Importance (Random Forest)

Feature Dictionary

ML Model Comparison

Random Forest

XGBoost

Neural Network

SVM

Logistic Regression

K-Nearest Neighbors

Cross-Validation Results

Best Model - Confusion Matrix

Benchmark References

NLP & Clinical Notes

Sample Clinical Note with NER

NER Model Performance

Clinical NLP Benchmarks

Bias & Fairness Analysis

⚖️ AI Fairness in Healthcare

Performance by Sex

Performance by Age Group

Fairness Metrics (AIF360)

Fairness References