我正在构建一个 k-NN 算法,我想找到最佳邻居数。
这是我写的代码:
#Finding the optimum number of neighbors
all_train_acc = []
all_test_acc = []
k_values = range(1,11)
for k in k_values:
#Train and predict
kNN = KNeighborsClassifier(n_neighbors=k)
kNN.fit(X_train, y_train)
#Training set accuracy and Test set accuracy
train_acc = accuracy_score(y_true=y_train, y_pred=kNN.predict(X_train))
test_acc = accuracy_score(y_true=y_test, y_pred=kNN.predict(X_test))
all_train_acc.append(train_acc)
all_test_acc.append(test_acc)
print ("N neighbors:", k, " - Train Accuracy: {:.3f}".format(train_acc), " - Test Accuracy: {:.3f}".format(test_acc))
plt.figure()
plt.plot(k_values, all_train_acc, label='Training accuracy')
plt.plot(k_values, all_test_acc, label='Test accuracy')
plt.ylabel('Accuracy')
plt.xlabel('n of neighbors')
plt.legend()
我该如何解释它们?在 k=8 时训练和测试准确率几乎相等是什么意思? 我应该选择 k = 8 还是 k = 5? 谢谢