最近邻居的距离函数的输入尺寸

时间:2017-01-06 11:15:05

标签: python scikit-learn nearest-neighbor

scikit-learn的无监督最近邻居的背景下,我已经实现了自己的距离函数来处理我的不确定点(即一个点表示为正态分布):

def my_mahalanobis_distance(x, y):

'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, 
                        x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
                        y[2]: cov_y_11, y[3]: cov_y_22 
'''     

    cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
    return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)

但是,当我设置最近的邻居时:

nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

其中X(N, 4) (n_samples, n_features)数组,如果我在x中打印ymy_mahalanobis_distance,我会得到(10,)的形状正如我所期望的那样(4,)

示例:

我将以下行添加到my_mahalanobis_distance

print(x.shape)

然后在我的主要:

n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

结果是:

(10,)
ValueError: shapes (2,) and (8,8) not aligned: 2 (dim 0) != 8 (dim 0)

我完全理解错误,但我不明白为什么我的x.shape(10,),而4中的功能数量为X

我正在使用 Python 2.7.10 scikit-learn 0.16.1

编辑:

return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)替换为return 1仅用于测试返回:

(10,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)

因此,只有第一次调用my_mahalanobis_distance是错误的。在第一次迭代中查看xy值,我的观察结果为:

  • xy相同

  • 如果我多次运行我的代码,xy仍然相同,但它们的值与之前的运行相比有所变化。

  • 这些值似乎来自numpy.random函数。

我会得出结论,这样的第一个调用是一段尚未删除的调试代码。

2 个答案:

答案 0 :(得分:1)

这不是一个答案,但评论的时间太长了。我无法重现错误。

使用:

Python 3.5.2和 Sklearn 0.18.1

代码:

from sklearn.neighbors import NearestNeighbors
import numpy as np
import scipy as sp
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)


def my_mahalanobis_distance(x, y):    
    cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
    print(x.shape)
    return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)

n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

输出

(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)

答案 1 :(得分:0)

我定制了我的def my_mahalanobis_distance(x, y): ''' x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, x[2]: cov_x_11, x[3]: cov_x_22 y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2, y[2]: cov_y_11, y[3]: cov_y_22 ''' if (x.size, y.size) == (4, 4): return sp.spatial.distance.mahalanobis(x[:2], y[:2], np.linalg.inv(np.diag(x[2:]) + np.diag(y[2:]))) # to handle the buggy first call when calling NearestNeighbors.fit() else: warnings.warn('x and y are respectively of size %i and %i' % (x.size, y.size)) return sp.spatial.distance.euclidean(x, y) 来处理这个问题:

File.listDir()