Scikit学习KDTree query_radius返回count和ind吗?

时间:2018-09-14 04:00:39

标签: python numpy machine-learning scikit-learn kdtree

我正在尝试同时返回count(邻居的数量)和ind(所述邻居的索引),但是除非我两次致电query_radius,否则我将无法返回对于Python而言,计算密集型实际上比遍历并计算ind中每一行的大小对我而言,更快!这似乎效率极低,所以我想知道是否有办法在一次通话中将它们全部退还?

我在调用tree之后尝试访问query_radius的count和ind对象,但是它不存在。在numpy中没有有效的方法可以做到这一点,

>>> array = np.array([[1,2,3], [2,3,4], [6,2,3]])
>>> tree = KDTree(array)
>>> neighbors = tree.query_radius(array, 1)
>>> tree.ind
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'sklearn.neighbors.kd_tree.KDTree' object has no attribute 'ind'
>>> tree.count
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'sklearn.neighbors.kd_tree.KDTree' object has no attribute 'count'

1 个答案:

答案 0 :(得分:0)

不确定为什么您认为需要两次:

a = np.random.rand(100,3)*10
tree = KDTree(a)
neighbors = tree.query_radius(a, 1)

%timeit counts = tree.query_radius(a, 1, count_only = 1)
1000 loops, best of 3: 231 µs per loop

%timeit counts = np.array([arr.size for arr in neighbors])
The slowest run took 5.66 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 22.5 µs per loop

仅在neighbors中找到数组对象的大小比重做tree.query_radius

快得多