Learning Algorithms¶
Learning Type | Task | Algorithm | Comment | Assumptions | Pre-Processing | Probabilistic | Parametric | Scope | \(d_\text{VC}\) | Bias | Variance | Generalization | Advantages | Disadvantages | Time Complexity Training | Time Complexity Inference | Space Complexity Training | Space Complexity Inference |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Supervised | Regression | OLS | Mean has linear relationship with inputs Constant variance | ❌ | ✅ | Global | \(k+1\) | High | Low | Good \(n >> k\) | ||||||||
Classification Binary | LogisticRegression | ✅ | ✅ | Global | \(k+1\) | High | Low | Good \(n >> k\) | ||||||||||
Classification Ordered | OrderedRegression | |||||||||||||||||
Regression/ Classification | Piecewise Constant | ❌ | ❌ | Local | ||||||||||||||
Regression/ Classification | Piecewise Polynomial | ❌ | ❌ | Local | ||||||||||||||
Regression/ Classification | SVM | Margin-Based | ❌ | ✅ | Computationally-expensive | |||||||||||||
Regression/ Classification | Gaussian Processes | ✅ | ✅ | |||||||||||||||
Regression/ Classification | KNN Nearest Neighbor | Good baseline model Can use mean, median, mode for regression Can use weightage, voting for classification | Underlying relationship b/w \(x\) and \(y\) is continuous | - Feature scaling - Dimensionality reduction | ❌ | ❌ | + Lazy-learner: no train time, new train data can be added without re-training + Can perform multi-class out-of-the-box | Lazy-learner: High space complexity, high inference time - Does not work well for large \(n\) or \(p\): need to calculate the distance of inference record wrt training records - Sensitive to noise - Sensitive to missing data - Sensitive to scale - Cannot extrapolate | ||||||||||
Regression/ Classification | Decision Tree | Automatic Piecewise Constant Exactly opposite in characteristics wrt to OLS | ❌ | ❌ | Local | Low | High | - Highly-interpretable - Auto-detect non-linear relationships - Auto-model variable interactions - Fast evaluation: Traversal only occurs on subset of attributes | - Poor regressive performance - Unstable: Tree struct sensitive to train data; changing train data changes tree - Require large no of splits for even simple relationships - Cannot extrapolate | |||||||||
Regression/ Classification | Linear Tree | Automatic Piecewise Polynomial | ❌ | ❌ | Local | Can extrapolate | ||||||||||||
Regression/ Classification | Linear Forest | |||||||||||||||||
Regression/ Classification | Linear Boosting | |||||||||||||||||
Regression/ Classification | Random Forest | Bagged Trees | ❌ | ❌ | Local | - Cannot extrapolate | ||||||||||||
Regression/ Classification | XGBoost | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | ||||||||||||
Regression/ Classification | CatBoost | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | ||||||||||||
Regression/ Classification | LightGBM | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | ||||||||||||
Unsupervised | Clustering | K-Means K-Medoids | Euclidean distance assumes spherical clusters | ❌ | ❌ | |||||||||||||
Clustering | Gaussian Mixtures | |||||||||||||||||
Clustering | Hierarchical Clustering | |||||||||||||||||
Clustering | One-Many Clustering | |||||||||||||||||
Clustering | Graph Clustering | |||||||||||||||||
Anomaly Detection | KNN | |||||||||||||||||
Kernel Density Estimation | ✅ | |||||||||||||||||
Isolation Forest | ❌ | |||||||||||||||||
One-Class SVM | ||||||||||||||||||
Autoencoders | Reconstruction loss will be high for anomalous events | |||||||||||||||||
Cluster Analysis | ||||||||||||||||||
Re-Inforcement Learning | Q-Learning | ❌ | ❌ |
Curse of Dimensionality¶
As the no of dimensions increases, relative distances tend to 0
Distance-based models are the most affected
- KNN
- K-Means
- Tree-based classification
- SVM?