Skip to content

Learning Algorithms¶

Learning Type	Task	Algorithm	Comment	Assumptions	Pre-Processing	Probabilistic	Parametric	Scope	\(d_\text{VC}\)	Bias	Variance	Generalization	Advantages	Disadvantages	Time Complexity Training	Time Complexity Inference	Space Complexity Training	Space Complexity Inference
Supervised	Regression	OLS		Mean has linear relationship with inputs Constant variance		❌	✅	Global	\(k+1\)	High	Low	Good \(n >> k\)
	Classification Binary	LogisticRegression				✅	✅	Global	\(k+1\)	High	Low	Good \(n >> k\)
	Classification Ordered	OrderedRegression
	Regression/ Classification	Piecewise Constant				❌	❌	Local
	Regression/ Classification	Piecewise Polynomial				❌	❌	Local
	Regression/ Classification	SVM	Margin-Based			❌	✅							Computationally-expensive
	Regression/ Classification	Gaussian Processes				✅	✅
	Regression/ Classification	KNN Nearest Neighbor	Good baseline model Can use mean, median, mode for regression Can use weightage, voting for classification	Underlying relationship b/w \(x\) and \(y\) is continuous	- Feature scaling - Dimensionality reduction	❌	❌						+ Lazy-learner: no train time, new train data can be added without re-training + Can perform multi-class out-of-the-box	Lazy-learner: High space complexity, high inference time - Does not work well for large \(n\) or \(p\): need to calculate the distance of inference record wrt training records - Sensitive to noise - Sensitive to missing data - Sensitive to scale - Cannot extrapolate
	Regression/ Classification	Decision Tree	Automatic Piecewise Constant Exactly opposite in characteristics wrt to OLS			❌	❌	Local		Low	High		- Highly-interpretable - Auto-detect non-linear relationships - Auto-model variable interactions - Fast evaluation: Traversal only occurs on subset of attributes	- Poor regressive performance - Unstable: Tree struct sensitive to train data; changing train data changes tree - Require large no of splits for even simple relationships - Cannot extrapolate
	Regression/ Classification	Linear Tree	Automatic Piecewise Polynomial			❌	❌	Local					Can extrapolate
	Regression/ Classification	Linear Forest
	Regression/ Classification	Linear Boosting
	Regression/ Classification	Random Forest	Bagged Trees			❌	❌	Local						- Cannot extrapolate
	Regression/ Classification	XGBoost	Boosted Trees			❌	❌	Local						- Cannot extrapolate
	Regression/ Classification	CatBoost	Boosted Trees			❌	❌	Local						- Cannot extrapolate
	Regression/ Classification	LightGBM	Boosted Trees			❌	❌	Local						- Cannot extrapolate
Unsupervised	Clustering	K-Means K-Medoids		Euclidean distance assumes spherical clusters		❌	❌
	Clustering	Gaussian Mixtures
	Clustering	Hierarchical Clustering
	Clustering	One-Many Clustering
	Clustering	Graph Clustering
	Anomaly Detection	KNN
		Kernel Density Estimation				✅
		Isolation Forest				❌
		One-Class SVM
		Autoencoders	Reconstruction loss will be high for anomalous events
		Cluster Analysis
Re-Inforcement Learning		Q-Learning				❌	❌

Curse of Dimensionality¶

As the no of dimensions increases, relative distances tend to 0

Distance-based models are the most affected

KNN
K-Means
Tree-based classification
SVM?

Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow

Comments