Learning Algorithms¶
Learning Type | Task | Algorithm | Comment | Probabilistic | Parametric | Scope | \(d_\text{VC}\) | Bias | Variance | Generalization | Advantages | Disadvantages |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Supervised | Regression | OLS | ❌ | ✅ | Global | \(k+1\) | High | Low | Good \(n >> k\) | |||
Classification | Logistic | ✅ | ✅ | Global | \(k+1\) | High | Low | Good \(n >> k\) | ||||
Regression/ Classification | Piecewise Constant | ❌ | ❌ | Local | ||||||||
Regression/ Classification | Piecewise Polynomial | ❌ | ❌ | Local | ||||||||
Regression/ Classification | SVM | Margin-Based | ❌ | ✅ | Computationally-expensive | |||||||
Regression/ Classification | Gaussian Processes | ✅ | ✅ | |||||||||
Regression/ Classification | KNN Nearest Neighbor | Good baseline model Can use mean, median, mode for regression Can use weightage, voting for classification | ❌ | ❌ | ||||||||
Regression/ Classification | Decision Tree | Automatic Piecewise Constant Exactly opposite in characteristics wrt to OLS | ❌ | ❌ | Local | Low | High | - Highly-interpretable - Auto-detect non-linear relationships - Auto-model variable interactions - Fast evaluation: Traversal only occurs on subset of attributes | - Poor regressive performance - Unstable: Tree struct sensitive to train data; changing train data changes tree - Require large no of splits for even simple relationships | |||
Regression/ Classification | Linear Tree | Automatic Piecewise Polynomial | ❌ | ❌ | Local | |||||||
Regression/ Classification | Random Forest | Bagged Trees | ❌ | ❌ | Local | |||||||
Regression/ Classification | XGBoost | Boosted Trees | ❌ | ❌ | Local | |||||||
Regression/ Classification | CatBoost | Boosted Trees | ❌ | ❌ | Local | |||||||
Regression/ Classification | LightGBM | Boosted Trees | ❌ | ❌ | Local | |||||||
Unsupervised | Clustering | K-Means K-Medoids | ❌ | ❌ | ||||||||
Clustering | Gaussian Mixtures | |||||||||||
Clustering | Hierarchical Clustering | |||||||||||
Clustering | One-Many Clustering | |||||||||||
Clustering | Graph Clustering | |||||||||||
Anomaly Detection | Kernel Density Estimation | ✅ | ||||||||||
Anomaly Detection | Isolation Forest | ❌ | ||||||||||
Re-Inforcement Learning | Q-Learning | ❌ | ❌ |
Curse of Dimensionality¶
As the no of dimensions increases, relative distances tend to 0
Distance-based models are the most affected
- KNN
- K-Means
- Tree-based classification
- SVM?