Skip to content

Learning Algorithms

Learning Type Task Algorithm Comment Probabilistic Parametric Scope \(d_\text{VC}\) Bias Variance Generalization Advantages Disadvantages
Supervised Regression OLS Global \(k+1\) High Low Good
\(n >> k\)
Classification Logistic Global \(k+1\) High Low Good
\(n >> k\)
Regression/
Classification
Piecewise Constant Local
Regression/
Classification
Piecewise Polynomial Local
Regression/
Classification
SVM Margin-Based Computationally-expensive
Regression/
Classification
Gaussian Processes
Regression/
Classification
KNN
Nearest Neighbor
Good baseline model

Can use mean, median, mode for regression
Can use weightage, voting for classification
Regression/
Classification
Decision Tree Automatic Piecewise Constant

Exactly opposite in characteristics wrt to OLS
Local Low High - Highly-interpretable
- Auto-detect non-linear relationships
- Auto-model variable interactions
- Fast evaluation: Traversal only occurs on subset of attributes
- Poor regressive performance
- Unstable: Tree struct sensitive to train data; changing train data changes tree
- Require large no of splits for even simple relationships
Regression/
Classification
Linear Tree Automatic Piecewise Polynomial Local
Regression/
Classification
Random Forest Bagged Trees Local
Regression/
Classification
XGBoost Boosted Trees Local
Regression/
Classification
CatBoost Boosted Trees Local
Regression/
Classification
LightGBM Boosted Trees Local
Unsupervised Clustering K-Means
K-Medoids
Clustering Gaussian Mixtures
Clustering Hierarchical Clustering
Clustering One-Many Clustering
Clustering Graph Clustering
Anomaly Detection Kernel Density Estimation
Anomaly Detection Isolation Forest
Re-Inforcement Learning Q-Learning

Curse of Dimensionality

As the no of dimensions increases, relative distances tend to 0

Distance-based models are the most affected

  • KNN
  • K-Means
  • Tree-based classification
  • SVM?
Last Updated: 2024-05-14 ; Contributors: AhmedThahir

Comments