Cython遍历数组和索引仍然很慢

时间:2020-03-19 16:09:46

标签: python indexing iteration cython

我目前正在尝试加快采用x,y坐标数组的算法的速度,找到彼此之间最远的指定点数(基于两个给定的起点)并返回其索引。

代码就是这样的: (distMat是一个数组,其中包含所有点之间的距离,numIndices表示所需的点数,index0和index1表示两个startPoints的索引。)

import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.float
ctypedef np.float_t DTYPE_t
ctypedef np.int32_t INT32_t
ctypedef np.int64_t INT64_t


def find_furthest_indices(np.ndarray[DTYPE_t, ndim=2] distMat, int numIndices, int index0, int index1):
    cdef int i, j
    cdef double dist, minDist, curDist
    cdef np.ndarray[INT32_t, ndim=1] selectedIndices = np.empty(numIndices, dtype=np.int32)
    cdef np.ndarray[INT32_t, ndim=1] remainingIndices = np.arange(numIndices, dtype=np.int32)

    selectedIndices[0] = index0
    selectedIndices[1] = index1
    for i in range(numIndices-2):
        minDist = 0.0
        for j in remainingIndices:
            dist = np.inf

            for k in selectedIndices[:i+1]:
                curDist = distMat[j][k]
                if curDist < dist:
                    dist = curDist

            if dist > minDist:
                minj = j
                minDist = dist

        selectedIndices[i+2] = minj
        remainingIndices = remainingIndices[remainingIndices!=minj]

    return selectedIndices

它可以工作,但是(也如预期的那样)在提交更大的数组(例如5000点-> distMat为5000x5000和numIndices = 500)时仍然有点慢。那可能是由于算法的本质(对于那些真正想知道的人来说是“ Kennard-Stone”),但是我想知道cythonize的彩色输出: CythonizeOutput

它用深黄色标记了以下几行,这意味着有很多Python交互可转换为C。我不明白为什么这三者在其中:

for j in remainingIndices:
for k in selectedIndices[:i+1]

curDist = distMat[j][k]

有人可以阐明为什么在这种情况下这些线很慢吗?我已经为给定的参数添加了类型定义,因此遍历它们并建立索引应该很快?

提前谢谢!

1 个答案:

答案 0 :(得分:1)

for j in remainingIndices:

使用Python迭代协议进行迭代。在Cython中,最好使用范围和索引。您想要类似的东西:

for jidx in range(remainingIndices.shape[0]):
    j = remainingIndices[jidx]

for k in selectedIndices[:i+1]

与上述相同


curDist = distMat[j][k]

创建数组的一个切片,然后在该切片中建立索引(这两种方法都必须使用Python)。你想要

curDist = distMat[j, k]