13,330 views
3 3 votes
I am wondering what happens as K increases in the KNN algorithm. It seems that as K increases the "p" (new point) tends to move closer to the middle of the decision boundary?

Any thoughts?
0% Accept Rate Accepted 0 answers out of 8 questions

2 Answers

3 3 votes

First of all, let's talk about the effect of small $k$, and large $k$. A small value of $k$ will increase the effect of noise, and a large value makes it computationally expensive. Data scientists usually choose as an odd number if the number of classes is 2 and another simple approach to select k is set $k=\sqrt n$.

The smaller values for $k$ , not only makes our classifier so sensitive to noise but also may lead to the overfitting problem. Large values for $k$ also may lead to underfitting. So, $k=\sqrt n$ for the start of the algorithm seems a reasonable choice.  We need to use Cross-validation to find a suitable value for $k$. 

The location of the new data point in the decision boundary depends on the arrangement of data points in the training set and the location of the new data point among them. Assume a situation that I have 100 data points and I chose $k = 100$ and we have two classes. In this special situation, the decision boundary is irrelevant to the location of the new data point (because it always classify to the majority class of the data points and it includes the whole space). So the new datapoint can be anywhere in this space. Therefore, I think we cannot make a general statement about it. 

2 2 votes
I am assuming that the knn algorithm was written in python. It depends if the radius of the function was set. The default is 1.0. Changing the parameter would choose the points closest to p according to the k value and controlled by radius, among others.
edited by

Related questions

1 1 vote
1 1 answer
3.2k
3.2k views
RSH asked Oct 1, 2018
3,239 views
I am not able to figure out how the calculation of the $m$ nearest points will be in a single dimensional array using kNN. Can anyone offer a clue or example?Thank you
1 1 vote
3 3 answers
3.7k
3.7k views
kalyanak.p asked Oct 1, 2018
3,712 views
The KNN function in the sklearn library (when coded properly), outputs the points closest to p based on the value of k, and others.The point(s) would include itself when ...
3 3 votes
1 1 answer
1.1k
1.1k views
kalyanak.p asked Sep 26, 2018
1,066 views
I have read online articles involving KNN and its emphasis on normalization. I would like to know if all KNN functions in Python need to involve normalization? I do know ...
4 4 votes
1 answers 1 answer
7.6k
7.6k views
tofighi asked Jun 26, 2019
7,617 views
Suppose, you have given the following dataset where x and y are the 2 features and color Red or Blue is the target variable.a) A new data point $x=1$ and $y=1$ is given. ...
3 3 votes
1 answers 1 answer
6.0k
6.0k views
tofighi asked Aug 10, 2020
6,004 views
The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary inputs to outputs.Assume for the following neur...