我为NN搜索定义了自定义距离函数(仍然是度量标准)。在返回距离之前,它将逐个处理功能。下面的脚本给出了我想要做的事情的想法。
import numpy as np
from sklearn.neighbors import NearestNeighbors # ver 0.16-git
def custom_dist_func(x,y):
print x,y
# a custom function will be here handling mixed features (real, nominal etc.)
return np.sqrt(sum((x-y)**2)) # use just this for now
data = np.array([ [1,2,3,1], [4,5,6,2], [7,8,9,3], [1,3,3,2], [5,5,6,3], [9,8,9,1] ])
neigh = NearestNeighbors(n_neighbors = 3, algorithm='ball_tree', metric='pyfunc', func=custom_dist_func)
neigh.fit(data)
以下是运行此脚本时返回的内容。
[ 0.60337662 0.07253084 0.27630738 0.90360858 0.50337067 0.31940312
0.42077267 0.70218361 0.15748644 0.20227022] [ 0.60337662 0.07253084 0.27630738 0.90360858 0.50337067 0.31940312
0.42077267 0.70218361 0.15748644 0.20227022]
[ 4.5 5.16666667 6. 2. ] [ 1. 2. 3. 1.]
[ 4.5 5.16666667 6. 2. ] [ 4. 5. 6. 2.]
[ 4.5 5.16666667 6. 2. ] [ 7. 8. 9. 3.]
[ 4.5 5.16666667 6. 2. ] [ 1. 3. 3. 2.]
[ 4.5 5.16666667 6. 2. ] [ 5. 5. 6. 3.]
[ 4.5 5.16666667 6. 2. ] [ 9. 8. 9. 1.]
虽然其余的计算是长度为len_features = 4的向量之间,但是长度为10的向量之间存在初始计算。
我无法解释这个初始计算。当我尝试使用len_features时,它仍然存在。 10,并导致程序引发索引错误,因为所需的自定义函数分别对每个可用功能起作用。
答案 0 :(得分:0)
注意这不是一个完整的答案。
我在距离函数中引发了语法错误:
def custom_dist_func(x,y):
ff
print x,y
# a custom function will be here handling mixed features (real, nominal etc.)
return np.sqrt(sum((x-y)**2)) # use just this for now
并重新编写代码(您的问题,我验证过,是第一次调用)。
输出结果为:
NameError Traceback (most recent call last)
<ipython-input-1-ce434c7e8153> in <module>()
10 data = np.array([ [1,2,3,1], [4,5,6,2], [7,8,9,3], [1,3,3,2], [5,5,6,3], [9,8,9,1] ])
11 neigh = NearestNeighbors(n_neighbors = 3, algorithm='ball_tree', metric='pyfunc', func=custom_dist_func)
---> 12 neigh.fit(data)
/home/amit/.local/lib/python2.7/site-packages/sklearn/neighbors/base.pyc in fit(self, X, y)
779 Training data. If array or matrix, shape = [n_samples, n_features]
780 """
--> 781 return self._fit(X)
/home/amit/.local/lib/python2.7/site-packages/sklearn/neighbors/base.pyc in _fit(self, X)
249 self._tree = BallTree(X, self.leaf_size,
250 metric=self.effective_metric_,
--> 251 **self.effective_metric_params_)
252 elif self._fit_method == 'kd_tree':
253 self._tree = KDTree(X, self.leaf_size,
/home/amit/.local/lib/python2.7/site-packages/sklearn/neighbors/ball_tree.so in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:8430)()
/home/amit/.local/lib/python2.7/site-packages/sklearn/neighbors/dist_metrics.so in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric (sklearn/neighbors/dist_metrics.c:4066)()
/home/amit/.local/lib/python2.7/site-packages/sklearn/neighbors/dist_metrics.so in sklearn.neighbors.dist_metrics.PyFuncDistance.__init__ (sklearn/neighbors/dist_metrics.c:9286)()
<ipython-input-1-ce434c7e8153> in custom_dist_func(x, y)
3
4 def custom_dist_func(x,y):
----> 5 ff
6 print x,y
7 # a custom function will be here handling mixed features (real, nominal etc.)
NameError: global name 'ff' is not defined
因此,在创建ball tree时,它显然会失败。
事实上,此时我在X
上运行的pdb显示它是你原来的矩阵。问题在于,从那里通过调用dist_metrics.pyx来愚弄pdb。
所以,这并没有解决它,而是缩小了它。我建议你看看dist_metrics.pyx并进一步弄清楚。