Nearest Neighbor¶
Represents each record as a datapoint with dimensional space, where is the number of attributes
Requirements¶
- Set of labelled records
- Normalized Proximity/Distance metric
- Min-Max normalization
- -normalization
- Odd value of
Choosing ¶
Similar to bias-variance tradeoff $$ \begin{aligned} \text{Flexibility} &\propto \dfrac{1}{k} \
\implies \text{Bias} &\propto k \ \text{Variance} &\propto \dfrac{1}{k} \end{aligned} $$
Value of ¶
Problems | Low Bias | Low Variance | |
---|---|---|---|
too small | Overfitting Susceptible to noise | ✅ | ❌ |
too large | Underfitting Susceptible to far-off points: Neighborhood includes points from other classes | ❌ | ✅ |
Finding optimal 1. Use a test set 2. Let 3. Record error rate of regressor/classifier 4. 5. Repeat steps 3 and 4, until value of for which 1. error is min 2. accuracy is maximum
Types¶
Classification | Regression | |
---|---|---|
Output | Class label is the majority label of \(k\) nearest neighbors | Predicted value will be the average of the continuous labels of \(k\)-nearest neighbors |
Steps | - Compute distance between test record and all train records - Identify \(k\) neighbors of test records (Low distance=high similarity) - Use majority voiting to find the class label of test sample | |
![]() |
Distance-Weighted KNN¶
Closer points are given larger weights for the majority voting scheme