Human Activity Recognition (HAR)
14-model benchmark on 9,299 UCI sensor readings. SVM Linear tops at 96.1%. t-SNE shows clean activity clusters. PCA retains 95% variance at ~95 components. Sitting/Standing confusion is the primary error source.
UCI HAR: 9,299 samples, 561 features, 6 activities, 30 subjects
14-model benchmark with PCA + t-SNE analysis on expert-engineered sensor features
End-to-end ML pipeline on UCI HAR — accelerometer + gyroscope from 30 subjects, 6 activities.
Dataset
- ▸9,299 samples (7,352 train / 2,947 test), 30 subjects (21 train / 9 test)
- ▸561 pre-extracted features: time-domain + frequency-domain statistics
- ▸6 activities: Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, Laying
Dimensionality Analysis
- ▸t-SNE on 3K subset: clear non-linear cluster separation even at 2D
- ▸PCA: 95% variance at ~95 components (from 561)
- ▸Top features by F-statistic: tBodyAcc-mean-X, tGravityAcc-mean-X, angle(X,gravityMean)
All 14 Models
| Model | Test Accuracy |
|---|---|
| Naive Bayes | 77.03% |
| KNN (k=5) | 88.02% |
| Decision Tree | 87.07% |
| Bagging | 89.11% |
| Random Forest | 92.74% |
| Extra Trees | 94.06% |
| Gradient Boosting | 93.18% |
| XGBoost | 94.10% |
| LightGBM | 93.99% |
| CatBoost | 91.62% |
| Stacking (RF+XGB+LGB→LR) | 95.18% |
| Logistic Regression | 95.42% |
| SVM RBF | 95.49% |
| SVM Linear | 96.10% |
Error Analysis (115 errors / 2,947 = 3.9%)
- ▸SITTING → STANDING: 55 errors
- ▸STANDING → SITTING: 18 errors
- ▸Cause: accelerometer posture features nearly identical for sitting vs standing — only fine gyroscope signals distinguish them
Why SVM Linear Wins The 561 features are domain-expert-engineered statistics designed to be linearly separable. SVM with linear kernel exploits this directly. Tree-based models add unnecessary complexity for already linearly-separable data.