我目前正在尝试加快采用x,y坐标数组的算法的速度,找到彼此之间最远的指定点数(基于两个给定的起点)并返回其索引。
代码就是这样的: (distMat是一个数组,其中包含所有点之间的距离,numIndices表示所需的点数,index0和index1表示两个startPoints的索引。)
import numpy as np
cimport numpy as np
cimport cython
DTYPE = np.float
ctypedef np.float_t DTYPE_t
ctypedef np.int32_t INT32_t
ctypedef np.int64_t INT64_t
def find_furthest_indices(np.ndarray[DTYPE_t, ndim=2] distMat, int numIndices, int index0, int index1):
cdef int i, j
cdef double dist, minDist, curDist
cdef np.ndarray[INT32_t, ndim=1] selectedIndices = np.empty(numIndices, dtype=np.int32)
cdef np.ndarray[INT32_t, ndim=1] remainingIndices = np.arange(numIndices, dtype=np.int32)
selectedIndices[0] = index0
selectedIndices[1] = index1
for i in range(numIndices-2):
minDist = 0.0
for j in remainingIndices:
dist = np.inf
for k in selectedIndices[:i+1]:
curDist = distMat[j][k]
if curDist < dist:
dist = curDist
if dist > minDist:
minj = j
minDist = dist
selectedIndices[i+2] = minj
remainingIndices = remainingIndices[remainingIndices!=minj]
return selectedIndices
它可以工作,但是(也如预期的那样)在提交更大的数组(例如5000点-> distMat为5000x5000和numIndices = 500)时仍然有点慢。那可能是由于算法的本质(对于那些真正想知道的人来说是“ Kennard-Stone”),但是我想知道cythonize的彩色输出: CythonizeOutput
它用深黄色标记了以下几行,这意味着有很多Python交互可转换为C。我不明白为什么这三者在其中:
for j in remainingIndices:
for k in selectedIndices[:i+1]
和
curDist = distMat[j][k]
有人可以阐明为什么在这种情况下这些线很慢吗?我已经为给定的参数添加了类型定义,因此遍历它们并建立索引应该很快?
提前谢谢!
答案 0 :(得分:1)
for j in remainingIndices:
使用Python迭代协议进行迭代。在Cython中,最好使用范围和索引。您想要类似的东西:
for jidx in range(remainingIndices.shape[0]):
j = remainingIndices[jidx]
for k in selectedIndices[:i+1]
与上述相同
curDist = distMat[j][k]
创建数组的一个切片,然后在该切片中建立索引(这两种方法都必须使用Python)。你想要
curDist = distMat[j, k]