与sklearn距离算法混淆

时间:2017-05-02 03:33:20

标签: python machine-learning scikit-learn euclidean-distance

虽然我想在KNeighborsClassifier中使用标准的欧几里德度量。

knn = KNeighborsRegressor(n_neighbors=k,metric='seuclidean' )
knn.fit(newx,y)

和显示的类型错误:

C:\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in fit(self, X, y)
    741             X, y = check_X_y(X, y, "csr", multi_output=True)
    742         self._y = y
--> 743         return self._fit(X)
    744 
    745 

C:\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in _fit(self, X)
    238             self._tree = BallTree(X, self.leaf_size,
    239                                   metric=self.effective_metric_,
--> 240                                   **self.effective_metric_params_)
    241         elif self._fit_method == 'kd_tree':
    242             self._tree = KDTree(X, self.leaf_size,

sklearn\neighbors\binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn\neighbors\ball_tree.c:9220)()

sklearn\neighbors\dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric (sklearn\neighbors\dist_metrics.c:4821)()

sklearn\neighbors\dist_metrics.pyx in sklearn.neighbors.dist_metrics.SEuclideanDistance.__init__ (sklearn\neighbors\dist_metrics.c:6399)()

TypeError: __init__() takes exactly 1 positional argument (0 given)

我只是输入我自己的功能来实现类似:

import numpy as np
from sklearn.preprocessing import StandardScaler
x = np.random.randint(0,10,(10,2))
y = np.random.randint(0,10,(10,1))
testx = np.random.randint(0,10,(1,2))
sds = StandardScaler()
sds.fit(x)
sklean_newx = sds.transform(x)
sklearn_newtestx = sds.transform(testx)
distance = np.sqrt(((testx - newx) ** 2).sum(axis=1))
for k in range(1,8):
    kn = distance.argsort()[:k]
    print(y[kn].mean(), '%'*10, k)

sklearn:

for k in range(1,8):
    knn = KNeighborsRegressor(n_neighbors=k,metric='seuclidean' , metric_params={'V':x.std(axis=0)})
    knn.fit(x ,y)
    print(knn.predict(testx)[0], '%'*10, k)

以上两个结果不一致,为什么?

1 个答案:

答案 0 :(得分:3)

seuclidean距离指标需要V参数才能满足以下计算:

  

sqrt(sum((x - y)^2 / V))

sklearn Distance Metrics文档中定义。

您可以使用V初始化中的metric_params参数传递KNeighborsRegressor(请参阅KNR docs)。