使用cython加速数千次设置操作

时间:2014-03-23 03:01:00

标签: python-2.7 optimization cython

我一直试图克服对Cython的恐惧(害怕因为我真的知道关于c或c ++的事情)

我有一个函数,它接受2个参数,一个集合(我们称之为testSet)和一个集合列表(我们称之为targetSets)。然后该函数遍历targetSets,并计算与testSet的交集的长度,将该值添加到列表中,然后返回该列表。

现在,这本身并不慢,但问题是我需要对testSet进行模拟(大量数据,~10,000),而targetSet大约需要10,000套。 / p>

因此,对于少量模拟测试,纯python实现需要大约50秒。

我尝试制作一个cython功能,它运行起来,它现在运行~16秒。

如果还有其他任何我可以对cython函数做的事情,任何人都可以想到这将是伟大的(python 2.7 btw)

这是我在 overlapFunc.pyx

中的Cython实现
def computeOverlap(set testSet, list targetSets):
    cdef list obsOverlaps  = []
    cdef int i, N
    cdef set overlap
    N = len(targetSets)
    for i in range(N):
        overlap = testSet & targetSets[i]
        if len(overlap) <= 1:
            obsOverlaps.append(0)
        else:
            obsOverlaps.append(len(overlap))
    return obsOverlaps

setup.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("overlapFunc", 
                         ["overlapFunc.pyx"])]

setup(
      name = 'computeOverlap function',
      cmdclass = {'build_ext': build_ext},
      ext_modules = ext_modules
      )

和一些代码来构建一些随机集以进行测试和计时功能。的 test.py

import numpy as np
from overlapFunc import computeOverlap
import time

def simRandomSet(n):
    for i in range(n):
        simSet= set(np.random.randint(low=1, high=100, size=50))
        yield simSet


if __name__ == '__main__':
    np.random.seed(23032014)
    targetSet = [set(np.random.randint(low=1, high=100, size=50)) for i in range(10000)]

    simulatedTestSets = simRandomSet(200)
    start = time.time()
    for i in simulatedTestSets:
        obsOverlaps = computeOverlap(i, targetSet)
    print time.time()-start

我尝试在computerOverlap函数的开头更改def,如:

cdef list computeOverlap(set testSet, list targetSets):

但是当我运行setup.py脚本时,我收到以下警告消息:

'__pyx_f_11overlapFunc_computeOverlap' defined but not used [-Wunused-function]

然后当我运行试图使用该函数的东西时,我得到一个导入错误:

    from overlapFunc import computeOverlap
ImportError: cannot import name computeOverlap

先谢谢你的帮助,

干杯,

戴维

1 个答案:

答案 0 :(得分:2)

在以下行中,扩展模块名称和文件名与实际文件名不匹配。

ext_modules = [Extension("computeOverlapWithGeneList", 
                         ["computeOverlapWithGeneList.pyx"])]

将其替换为:

ext_modules = [Extension("overlapFunc",
                         ["overlapFunc.pyx"])]