KNeighborsClassifier的概率预测方法仅返回0和1

时间:2016-05-07 13:30:55

标签: machine-learning scikit-learn probability nearest-neighbor

有谁能告诉我代码的问题是什么? 为什么我可以通过使用LinearRegression来预测虹膜数据集的概率,但是KNeighborsClassifier给我0或1,而它应该给我一个像LinearRegression那样的结果?

from sklearn.datasets import load_iris
from sklearn import metrics

iris = load_iris()
X = iris.data
y = iris.target

for train_index, test_index in skf:
    X_train, X_test = X_total[train_index], X_total[test_index]
    y_train, y_test = y_total[train_index], y_total[test_index]

from sklearn.linear_model import LogisticRegression
ln = LogisticRegression()
ln.fit(X_train,y_train)

ln.predict_proba(X_test)[:,1]
  

array([0.18075722,0.08906078,0.14693156,0.10467766,   0.14823032,           0.70361962,0.66533216,0.77864636,0.67203114,0.68655163,           0.25219798,0.3863194,0.30735105,0.13963637,0.28017798])

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5, algorithm='ball_tree', metric='euclidean')
knn.fit(X_train, y_train)

knn.predict_proba(X_test)[0:10,1]
  

数组([0,0,0,0,0,0,1,1,1,1,1,1。])

2 个答案:

答案 0 :(得分:6)

因为KNN的概率概念非常有限。它的估计只是最近邻居的投票比例。将邻居数量增加到15或100或决策边界附近的查询点,您将看到更多样化的结果。目前,您的积分总是只有5个相同标签的邻居(因此概率为0或1)。

答案 1 :(得分:0)

这里,我有一个 knn 模型 - model_knn

使用 sklearn

result = {}    
model_classes = model_knn.classes_
predicted = model_knn.predict(word_average)
score = model_knn.predict_proba(word_average)
index = np.where(model_classes == predicted[0])[0][0]
result["predicted"] = predicted[0]
result["score"] = score[0][index]