Skip to content

Learning Algorithms

Learning Type Task Algorithm Comment Assumptions Pre-Processing Probabilistic Parametric Scope \(d_\text{VC}\) Bias Variance Generalization Advantages Disadvantages Time Complexity
Training
Time Complexity
Inference
Space Complexity
Training
Space Complexity
Inference
Supervised Regression OLS Mean has linear relationship with inputs
Constant variance
Global \(k+1\) High Low Good
\(n >> k\)
Classification Binary LogisticRegression Global \(k+1\) High Low Good
\(n >> k\)
Classification Ordered OrderedRegression
Regression/
Classification
Piecewise Constant Local
Regression/
Classification
Piecewise Polynomial Local
Regression/
Classification
SVM Margin-Based Computationally-expensive
Regression/
Classification
Gaussian Processes
Regression/
Classification
KNN
Nearest Neighbor
Good baseline model

Can use mean, median, mode for regression
Can use weightage, voting for classification
Underlying relationship b/w \(x\) and \(y\) is continuous - Feature scaling
- Dimensionality reduction
+ Lazy-learner: no train time, new train data can be added without re-training
+ Can perform multi-class out-of-the-box
Lazy-learner: High space complexity, high inference time

- Does not work well for large \(n\) or \(p\): need to calculate the distance of inference record wrt training records
- Sensitive to noise
- Sensitive to missing data
- Sensitive to scale
- Cannot extrapolate
Regression/
Classification
Decision Tree Automatic Piecewise Constant

Exactly opposite in characteristics wrt to OLS
Local Low High - Highly-interpretable
- Auto-detect non-linear relationships
- Auto-model variable interactions
- Fast evaluation: Traversal only occurs on subset of attributes
- Poor regressive performance
- Unstable: Tree struct sensitive to train data; changing train data changes tree
- Require large no of splits for even simple relationships
- Cannot extrapolate
Regression/
Classification
Linear Tree Automatic Piecewise Polynomial Local Can extrapolate
Regression/
Classification
Linear Forest
Regression/
Classification
Linear Boosting
Regression/
Classification
Random Forest Bagged Trees Local - Cannot extrapolate
Regression/
Classification
XGBoost Boosted Trees Local - Cannot extrapolate
Regression/
Classification
CatBoost Boosted Trees Local - Cannot extrapolate
Regression/
Classification
LightGBM Boosted Trees Local - Cannot extrapolate
Unsupervised Clustering K-Means
K-Medoids
Euclidean distance assumes spherical clusters
Clustering Gaussian Mixtures
Clustering Hierarchical Clustering
Clustering One-Many Clustering
Clustering Graph Clustering
Anomaly Detection KNN
Kernel Density Estimation
Isolation Forest
One-Class SVM
Autoencoders Reconstruction loss will be high for anomalous events
Cluster Analysis
Re-Inforcement Learning Q-Learning

Curse of Dimensionality

As the no of dimensions increases, relative distances tend to 0

Distance-based models are the most affected

  • KNN
  • K-Means
  • Tree-based classification
  • SVM?
Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow

Comments