K-nearest Neighbors Algorithm


There can be various methods of solving the same problem with the help of algorithms. It depends on different factors like time complexity, space complexity and so on. K-nearest Neighbors is another algorithm that belongs to the supervised machine learning algorithm category. Let’s find out for what is used for.

K-nearest Neighbors Algorithm

What is K-nearest Neighbors Algorithm?

It is a non-parametric method and a machine learning algorithm that is useful in solving predictive problems related to classification as well as regression. It is used especially in pattern recognition. Moreover, it is known as a lazy learning algorithm because of the absence of specialized training phase use of data for training classification. This algorithm has applications in the banking system, politics, and computing of credit rating.

Lazy learning is also called instance based learning which is based on the dataset memorization. This algorithm is best known for its simplicity and easy to learn features. It is worth to know that the kNN algorithm make the use of local neighborhood for obtaining a prediction.

You will need a distance function for the comparison of examples similarity. Some of the popular distance measures used in kNN are- Euclidean distance, Manhattan distance, Hamming distance, Minkowski distance, cosine and so on. Euclidean distance is the most used among them. It is applied where input variables are similar.

How this algorithm works?

In kNN, k represents the total numbers of nearest neighbors used for classification or prediction of a test sample. The process of choosing the right value of k is known as parameter tuning.

Implementation Pseudocode

  • Choose a value for K
  • Take the K nearest neighbor of the new data point as per the Euclidean distance
  • Begin counting the number of data points in all the given categories and provide a new data point to that category where you find most numbers of neighbors.

Implementation in python:

Using kNN as Regressor

    Import the required python packages
    import numpy as np import matplotlib.pyplot as plt import pandas as pd
  • Download the iris dataset or any other that you want
  • path = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
  • Provide column names to the dataset
  • headernames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
  • Read this dataset to the panda deframe
  • dataset = pd.read_csv(path, names = headernames) dataset.head()
  • The data will be processed with the following code:
  • X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values
  • The next step is to divide the dataset into train and test split
  • from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)
  • Data will be scaled as:
  • from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test)
  • Perform training of the model with KNeighborsClassifier class of sklearn
  • from sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors = 8) classifier.fit(X_train, y_train)
  • Lastly, make prediction
  • y_pred = classifier.predict(X_test)
  • Print result by using:
  • from sklearn.metrics import classification_report, confusion_matrix, accuracy_score result = confusion_matrix(y_test, y_pred) print("Confusion Matrix:") print(result) result1 = classification_report(y_test, y_pred) print("Classification Report:",) print (result1) result2 = accuracy_score(y_test,y_pred) print("Accuracy:",result2)

Using kNN as Classifier

  • We do the same here by importing python packages
  • import numpy as np import pandas as pd
  • Download Iris dataset
  • Add column names
  • Read the dataset to the panda dataframe
  • data = pd.read_csv(url, names = headernames) array = data.values X = array[:,:2] Y = array[:,2] data.shape output:(150, 5)
  • Import KNeighborsRegressor from sklearn
  • from sklearn.neighbors import KNeighborsRegressor knnr = KNeighborsRegressor(n_neighbors = 10) knnr.fit(X, y)
  • Lastly, find MSE with the following code and get the output by running it
  • print ("The MSE is:",format(np.power(y-knnr.predict(X),2).mean()))

I hope that you have understood the ways of using and implementing K-nearest neighbor algorithm.


Struggling to Understand Algorithm and Flowchart? Try our Notes

Want to Test Your Knowledge on Algorithm and Flowchart?


Recommended Deals End in

Online Games
Play 2048 Game Online and Relax.
Play 2048 Game Online

Search Tags

    kNN pseudocode

    knn algorithm in python