我一直试图克服对Cython的恐惧(害怕因为我真的知道关于c或c ++的事情)
我有一个函数,它接受2个参数,一个集合(我们称之为testSet
)和一个集合列表(我们称之为targetSets
)。然后该函数遍历targetSets
,并计算与testSet
的交集的长度,将该值添加到列表中,然后返回该列表。
现在,这本身并不慢,但问题是我需要对testSet进行模拟(大量数据,~10,000),而targetSet大约需要10,000套。 / p>
因此,对于少量模拟测试,纯python实现需要大约50秒。
我尝试制作一个cython功能,它运行起来,它现在运行~16秒。
如果还有其他任何我可以对cython函数做的事情,任何人都可以想到这将是伟大的(python 2.7 btw)
这是我在 overlapFunc.pyx
中的Cython实现def computeOverlap(set testSet, list targetSets):
cdef list obsOverlaps = []
cdef int i, N
cdef set overlap
N = len(targetSets)
for i in range(N):
overlap = testSet & targetSets[i]
if len(overlap) <= 1:
obsOverlaps.append(0)
else:
obsOverlaps.append(len(overlap))
return obsOverlaps
和 setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("overlapFunc",
["overlapFunc.pyx"])]
setup(
name = 'computeOverlap function',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)
和一些代码来构建一些随机集以进行测试和计时功能。的 test.py
import numpy as np
from overlapFunc import computeOverlap
import time
def simRandomSet(n):
for i in range(n):
simSet= set(np.random.randint(low=1, high=100, size=50))
yield simSet
if __name__ == '__main__':
np.random.seed(23032014)
targetSet = [set(np.random.randint(low=1, high=100, size=50)) for i in range(10000)]
simulatedTestSets = simRandomSet(200)
start = time.time()
for i in simulatedTestSets:
obsOverlaps = computeOverlap(i, targetSet)
print time.time()-start
我尝试在computerOverlap函数的开头更改def,如:
cdef list computeOverlap(set testSet, list targetSets):
但是当我运行setup.py
脚本时,我收到以下警告消息:
'__pyx_f_11overlapFunc_computeOverlap' defined but not used [-Wunused-function]
然后当我运行试图使用该函数的东西时,我得到一个导入错误:
from overlapFunc import computeOverlap
ImportError: cannot import name computeOverlap
先谢谢你的帮助,
干杯,
戴维
答案 0 :(得分:2)
在以下行中,扩展模块名称和文件名与实际文件名不匹配。
ext_modules = [Extension("computeOverlapWithGeneList",
["computeOverlapWithGeneList.pyx"])]
将其替换为:
ext_modules = [Extension("overlapFunc",
["overlapFunc.pyx"])]