更新(镜像最先进的知识水平)状态: 2017-05-12
这次更新的原因是,在我提出这个问题的时候,我并没有意识到我已经发现了一些关于Python3如何运作的内容"引擎盖"。
以下所有内容的结论是:
如果为迭代器编写自己的Python3代码并关心执行速度,则应将其编写为生成器函数而不是迭代器类。
下面是一个简约的代码示例,演示了表示为生成器函数的相同算法(此处:自制的Pythons range()
)运行速度比表示为迭代器类的速度快得多:
def gnrtYieldRange(startWith, endAt, step=1):
while startWith <= endAt:
yield startWith
startWith += step
class iterClassRange:
def __init__(self, startWith, endAt, step=1):
self.startWith = startWith - 1
self.endAt = endAt
self.step = step
def __iter__(self):
return self
def __next__(self):
self.startWith += self.step
if self.startWith <= self.endAt:
return self.startWith
else:
raise StopIteration
N = 10000000
print(" Size of created list N = {} elements (ints 1 to N)".format(N))
from time import time as t
from customRange import gnrtYieldRange as cthnYieldRange
from customRange import cintYieldRange
from customRange import iterClassRange as cthnClassRange
from customRange import cdefClassRange
iterPythnRangeObj = range(1, N+1)
gnrtYieldRangeObj = gnrtYieldRange(1, N)
cthnYieldRangeObj = cthnYieldRange(1, N)
cintYieldRangeObj = cintYieldRange(1, N)
iterClassRangeObj = iterClassRange(1, N)
cthnClassRangeObj = cthnClassRange(1, N)
cdefClassRangeObj = cdefClassRange(1, N)
sEXECs = [
"liPR = list(iterPythnRangeObj)",
"lgYR = list(gnrtYieldRangeObj)",
"lcYR = list(cthnYieldRangeObj)",
"liGR = list(cintYieldRangeObj)",
"liCR = list(iterClassRangeObj)",
"lcCR = list(cthnClassRangeObj)",
"ldCR = list(cdefClassRangeObj)"
]
sCOMMENTs = [
"Python3 own range(1, N+1) used here as reference for timings ",
"self-made range generator function using yield (run as it is) ",
"self-made range (with yield) run from module created by Cython",
"Cython-optimized self-made range (using yield) run from module",
"self-made range as iterator class using __next__() and return ",
"self-made range (using __next__) from module created by Cython",
"Cython-optimized self-made range (using __next__) from module "
]
for idx, sEXEC in enumerate(sEXECs):
s=t();exec(sEXEC);e=t();print("{} takes: {:3.1f} sec.".format(sCOMMENTs[idx], e-s))
print("All created lists are equal:", all([liPR == lgYR, lgYR == lcYR, lcYR == liGR, liGR == liCR, liCR == lcCR, lcCR == ldCR]) )
print("Run on Linux Mint 18.1, used Cython.__version__ == '0.25.2'")
上面的代码放入文件并运行打印到stdout:
>python3.6 -u "gnrtFunction-fasterThan-iterClass_runMe.py"
Size of created list N = 10000000 elements (ints 1 to N)
Python3 own range(1, N+1) used here as reference for timings takes: 0.2 sec.
self-made range generator function using yield (run as it is) takes: 1.1 sec.
self-made range (with yield) run from module created by Cython takes: 0.5 sec.
Cython-optimized self-made range (using yield) run from module takes: 0.3 sec.
self-made range as iterator class using __next__() and return takes: 3.9 sec.
self-made range (using __next__) from module created by Cython takes: 3.3 sec.
Cython-optimized self-made range (using __next__) from module takes: 0.2 sec.
All created lists are equal: True
Run on Linux Mint 18.1, used Cython.__version__ == '0.25.2'
>Exit code: 0
从上面的时间可以看出,自制range()
迭代器的生成器函数变量比迭代器类变量运行得更快,并且当不涉及代码优化时,此行为也传播到C代码级别由Cython创建的C代码。
如果你很好奇为什么会这么详细,你可以阅读所提供的答案或自己玩一下提供的代码。
在运行上述代码所需的缺失代码下面:
customRange.pyx
- Cython文件从:
customRange
模块
def gnrtYieldRange(startWith, endAt, step=1):
while startWith <= endAt:
yield startWith
startWith += step
class iterClassRange:
def __init__(self, startWith, endAt, step=1):
self.startWith = startWith - 1
self.endAt = endAt
self.step = step
def __iter__(self):
return self
def __next__(self):
self.startWith += self.step
if self.startWith <= self.endAt:
return self.startWith
else:
raise StopIteration
def cintYieldRange(int startWith, int endAt, int step=1):
while startWith <= endAt:
yield startWith
startWith += step
cdef class cdefClassRange:
cdef int startWith
cdef int endAt
cdef int step
def __init__(self, int startWith, int endAt, int step=1):
self.startWith = startWith - 1
self.endAt = endAt
self.step = step
def __iter__(self):
return self
def __next__(self):
self.startWith += self.step
if self.startWith <= self.endAt:
return self.startWith
else:
raise StopIteration
和用于创建Python customRange-setup.py
模块的设置文件customRange
:
import sys
sys.argv += ['build_ext', '--inplace']
from distutils.core import setup
from Cython.Build import cythonize
setup(
name = 'customRange',
ext_modules = cythonize("customRange.pyx"),
)
<小时/> <小时/> 现在有一些进一步的信息使得更容易理解所提供的答案:
当我问这个问题时,我正忙于一个相当复杂的问题
用于使用yield
从生成器函数形式的非唯一列表生成唯一组合的算法。我的目标是使用此算法创建一个用C语言编写的Python模块,以使其运行得更快。为此,我使用yield
和__next__()
重写了使用return
到迭代器类的生成器函数。当我比较算法的两个变体的速度时,我很惊讶迭代器类比生成器函数慢两倍,并且我(错误地)认为它与我的方式有关重写了算法(你需要知道这个,如果你想更好地理解这里的答案是什么),那么
最初问过如何使迭代器类版本以与生成器函数相同的速度运行,以及速度差异来自哪里?。
更多关于问题的历史:
在下面提供的Python脚本代码中,使用带有function
的Python yield
并使用class
实现了从非唯一元素列表创建唯一组合的完全相同的算法与__next__
。代码可以在复制/粘贴后运行,因此您可以自己查看我正在谈论的内容。
纯Python代码观察到的同样现象传播到由Cython创建的脚本代码创建的Python扩展模块的C代码中,所以它不仅限于Python级代码,因为它不会#39 ;在C代码级别消失。
问题是:
执行速度的巨大差异来自哪里? 有没有什么可以让两个代码变体以相当的速度运行?与函数/ yield变量相比,类/下一个实现是否出现了问题?根据我的知识,两者都是完全相同的代码...
这里的代码(调整突出显示的行中的数字会改变列表中元素的唯一性级别,组合是从对运行时间产生巨大影响的方式生成的):
def uniqCmboYieldIter(lstItems, lenCmbo):
dctCounter = {}
lenLstItems = len(lstItems)
for idx in range(lenLstItems):
item = lstItems[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
#:if
#:for
lstUniqs = sorted(dctCounter.keys())
lstCntRpts = [dctCounter[item] for item in lstUniqs]
lenUniqs = len(lstUniqs)
cmboAsIdxUniqs = [None] * lenCmbo
multiplicities = [0] * lenUniqs
idxIntoCmbo, idxIntoUniqs = 0, 0
while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
cmboAsIdxUniqs[idxIntoCmbo : idxIntoCmbo + count] = [idxIntoUniqs] * count
multiplicities[idxIntoUniqs] = count
idxIntoCmbo += count
idxIntoUniqs += 1
if idxIntoCmbo != lenCmbo:
return
while True:
yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)
for idxIntoCmbo in reversed(range(lenCmbo)):
x = cmboAsIdxUniqs[idxIntoCmbo]
y = x + 1
if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
break
else:
return
for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
x = cmboAsIdxUniqs[idxIntoCmbo]
cmboAsIdxUniqs[idxIntoCmbo] = y
multiplicities[x] -= 1
multiplicities[y] += 1
# print("# multiplicities:", multiplicities)
while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
y += 1
if y == lenUniqs:
break
class uniqCmboClassIter:
# ----------------------------------------------------------------------------------------------
def __iter__(self):
return self
# ----------------------------------------------------------------------------------------------
def __init__(self, lstItems, lenCmbo):
dctCounter = {}
lenLstItems = len(lstItems)
for idx in range(lenLstItems):
item = lstItems[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
#:if
#:for
self.lstUniqs = sorted(dctCounter.keys())
self.lenUniqs = len(self.lstUniqs)
self.lstCntRpts = [dctCounter[item] for item in self.lstUniqs]
self.lenCmbo = lenCmbo
self.cmboAsIdxUniqs = [None] * lenCmbo
self.multiplicities = [0] * self.lenUniqs
self.idxIntoCmbo, self.idxIntoUniqs = 0, 0
while self.idxIntoCmbo != self.lenCmbo and self.idxIntoUniqs != self.lenUniqs:
count = min(self.lstCntRpts[self.idxIntoUniqs], self.lenCmbo-self.idxIntoCmbo)
self.cmboAsIdxUniqs[self.idxIntoCmbo : self.idxIntoCmbo + count] = [self.idxIntoUniqs] * count
self.multiplicities[self.idxIntoUniqs] = count
self.idxIntoCmbo += count
self.idxIntoUniqs += 1
# print("self.multiplicities:", self.multiplicities)
# print("self.cmboAsIdxUniqs:", self.cmboAsIdxUniqs)
if self.idxIntoCmbo != self.lenCmbo:
return
self.stopIteration = False
self.x = None
self.y = None
return
# ----------------------------------------------------------------------------------------------
def __next__(self):
if self.stopIteration is True:
raise StopIteration
return
nextCmbo = tuple(self.lstUniqs[idxUniqs] for idxUniqs in self.cmboAsIdxUniqs)
for self.idxIntoCmbo in reversed(range(self.lenCmbo)):
self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
self.y = self.x + 1
if self.y < self.lenUniqs and self.multiplicities[self.y] < self.lstCntRpts[self.y]:
break
else:
self.stopIteration = True
return nextCmbo
for self.idxIntoCmbo in range(self.idxIntoCmbo, self.lenCmbo):
self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
self.cmboAsIdxUniqs[self.idxIntoCmbo] = self.y
self.multiplicities[self.x] -= 1
self.multiplicities[self.y] += 1
# print("# multiplicities:", multiplicities)
while self.y != self.lenUniqs and self.multiplicities[self.y] == self.lstCntRpts[self.y]:
self.y += 1
if self.y == self.lenUniqs:
break
return nextCmbo
# ============================================================================================================================================
lstSize = 48 # 48
uniqLevel = 12 # (7 ~60% unique) higher level => more unique items in the generated list
aList = []
from random import randint
for _ in range(lstSize):
aList.append( ( randint(1,uniqLevel), randint(1,uniqLevel) ) )
lenCmbo = 6
percUnique = 100.0 - 100.0*(lstSize-len(set(aList)))/lstSize
print("======================== lenCmbo:", lenCmbo,
" sizeOfList:", len(aList),
" noOfUniqueInList", len(set(aList)),
" percUnique", int(percUnique) )
import time
from itertools import combinations
# itertools.combinations
# ---
# def uniqCmboYieldIter(lstItems, lenCmbo):
# class uniqCmboClassIter: def __init__(self, lstItems, lenCmbo):
# ---
start_time = time.time()
print("Combos:%9i"%len(list(combinations(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list( combinations(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.")
start_time = time.time()
print("Combos:%9i"%len(list(uniqCmboYieldIter(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list(uniqCmboYieldIter(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.")
start_time = time.time()
print("Combos:%9i"%len(list(uniqCmboClassIter(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list(uniqCmboClassIter(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.")
和我的方框上的时间:
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 32 percUnique 66
Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.04635 seconds.
Combos: 1296058 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 3.25447 seconds.
Combos: 1296058 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 5.97371 seconds.
>Exit code: 0
[2017-05-02_03:23] 207474 <-Chrs,Keys-> 1277194 OnSave(): '/home/claudio/CgMint18/_Cg.DIR/ClaudioOnline/at-stackoverflow/bySubject/uniqueCombinations/nonRecursiveUniqueCombos_Cg.py'
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 22 percUnique 45
Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.05199 seconds.
Combos: 191072 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 0.47343 seconds.
Combos: 191072 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 0.89860 seconds.
>Exit code: 0
[2017-05-02_03:23] 207476 <-Chrs,Keys-> 1277202 OnSave(): '/home/claudio/CgMint18/_Cg.DIR/ClaudioOnline/at-stackoverflow/bySubject/uniqueCombinations/nonRecursiveUniqueCombos_Cg.py'
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 43 percUnique 89
Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.17285 seconds.
Combos: 6560701 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 16.72573 seconds.
Combos: 6560701 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 31.17714 seconds.
>Exit code: 0
更新(状态2017-05-07):
在提出问题并提供赏金时,我不知道有一种方法可以使用Cython从Python脚本代码中轻松地为迭代器对象创建扩展模块的C代码以及这样的C代码也可以使用
yield
从迭代器函数创建。
考虑到生成的更快版本的C扩展模块仍然不够快,无法与itertools.combinations
竞争,因此深入了解在使用时导致减速的确切原因并不是很有意义。迭代器类与迭代器函数相比如何克服这个问题。使用Cython找到加速更快版本的方法更有意义,特别是因为我是编写Python扩展模块的新手,在经过数小时的专注工作花费在调整现有C代码上之后无法创建工作代码itertools.combinations
由于Segmentation Fault
错误而自行修改,我无法理解其中的原因。
目前我认为仍然有空间加速我使用的Cython代码,而不需要更难以自己编写C代码。
运行良好的Cython代码和速度优化的Cython代码,它以某种方式发生变化(我目前无法看到原因)算法的工作方式因此产生了错误的结果。 Cython优化背后的想法是在Cython代码中使用Python / Cython数组而不是Python列表。任何提示如何从新手使用的算法中获得更快速运行的Python扩展模块&#34; safe&#34;方式是受欢迎的。
def subbags_by_loops_with_dict_counter(lstItems, int lenCmbo):
dctCounter = {}
cdef int lenLstItems = len(lstItems)
cdef int idx = 0
for idx in range(lenLstItems):
item = lstItems[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
#:if
#:for
lstUniqs = sorted(dctCounter.keys())
lstCntRpts = [dctCounter[item] for item in lstUniqs]
cdef int lenUniqs = len(lstUniqs)
cmboAsIdxUniqs = [None] * lenCmbo
multiplicities = [0] * lenUniqs
cdef int idxIntoCmbo
cdef int idxIntoUniqs
cdef int count
while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
cmboAsIdxUniqs[idxIntoCmbo : idxIntoCmbo + count] = [idxIntoUniqs] * count
multiplicities[idxIntoUniqs] = count
idxIntoCmbo += count
idxIntoUniqs += 1
if idxIntoCmbo != lenCmbo:
return
cdef int x
cdef int y
while True:
yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)
for idxIntoCmbo in reversed(range(lenCmbo)):
x = cmboAsIdxUniqs[idxIntoCmbo]
y = x + 1
if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
break
else:
return
for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
x = cmboAsIdxUniqs[idxIntoCmbo]
cmboAsIdxUniqs[idxIntoCmbo] = y
multiplicities[x] -= 1
multiplicities[y] += 1
while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
y += 1
if y == lenUniqs:
break
优化的CYTHON CODE下面会产生错误的结果:
def subbags_loops_dict_cython_optimized(lstItems, int lenCmbo):
dctCounter = {}
cdef int lenLstItems = len(lstItems)
cdef int idx = 0
for idx in range(lenLstItems):
item = lstItems[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
#:if
#:for
lstUniqs = sorted(dctCounter.keys())
lstCntRpts = [dctCounter[item] for item in lstUniqs]
cdef int lenUniqs = len(lstUniqs)
cdef array.array cmboAsIdxUniqs = array.array('i', [])
array.resize(cmboAsIdxUniqs, lenCmbo)
# cmboAsIdxUniqs = [None] * lenCmbo
cdef array.array multiplicities = array.array('i', [])
array.resize(multiplicities, lenUniqs)
# multiplicities = [0] * lenUniqs
cdef int idxIntoCmbo
cdef int maxIdxCmbo
cdef int curIdxCmbo
cdef int idxIntoUniqs
cdef int count
while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
maxIdxCmbo = idxIntoCmbo + count
curIdxCmbo = idxIntoCmbo
while curIdxCmbo < maxIdxCmbo:
cmboAsIdxUniqs[curIdxCmbo] = idxIntoUniqs
curIdxCmbo += 1
multiplicities[idxIntoUniqs] = count
idxIntoCmbo += count
idxIntoUniqs += 1
# print("multiplicities:", multiplicities)
# print("cmboAsIdxUniqs:", cmboAsIdxUniqs)
if idxIntoCmbo != lenCmbo:
return
cdef int x
cdef int y
while True:
yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)
for idxIntoCmbo in reversed(range(lenCmbo)):
x = cmboAsIdxUniqs[idxIntoCmbo]
y = x + 1
if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
break
else:
return
for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
x = cmboAsIdxUniqs[idxIntoCmbo]
cmboAsIdxUniqs[idxIntoCmbo] = y
multiplicities[x] -= 1
multiplicities[y] += 1
# print("# multiplicities:", multiplicities)
while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
y += 1
if y == lenUniqs:
break
答案 0 :(得分:5)
当我将itertools文档的一些配方重写为C扩展时,我提供了一些经验。我想我可能会有一些可以帮助你的见解。
当你编写纯Python代码时,它是速度(生成器)和功能(迭代器)之间的权衡。
yield
函数(称为生成器)用于提高速度,通常可以在不打扰内部状态的情况下编写它们。因此,编写它们的工作量较少,而且它们很快,因为Python只管理所有的状态&#34;。
生成器更快(或至少不慢)的原因主要是因为:
除__next__
- 方法外,他们直接实施tp_iternext
- 广告位(通常为__next__
)。在这种情况下,Python不必查找__next__
方法 - 这实际上是在下面的例子中使它更快的原因:
from itertools import islice
def test():
while True:
yield 1
class Test(object):
def __iter__(self):
return self
def __next__(self):
return 1
%timeit list(islice(test(), 1000))
# 173 µs ± 2.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit list(islice(Test(), 1000))
# 499 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
因为生成器直接填充__next__
- 插槽,所以它快了近3倍。
一个yield
- 函数和类有一个状态,但yield
函数保存和加载状态的速度比使用类和属性访问速度快得多:
def test():
i = 0
while True:
yield i
i += 1
class Test(object):
def __init__(self):
self.val = 0
def __iter__(self):
return self
def __next__(self):
current = self.val
self.val += 1
return current
%timeit list(islice(test(), 1000))
# 296 µs ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit list(islice(Test(), 1000))
# 1.22 ms ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
这次课程的速度已经慢了4倍(相比之下,几乎没有任何州参与过3次)。这是一个累积效应:所以更多的状态&#34;你有,类变体越慢。
yield
与班级方法的关系非常多。请注意,实际时间取决于操作类型。例如,如果调用next
时运行的实际代码是慢(即time.sleep(1)
),那么生成器和类之间几乎没有区别!
如果你想要一个快的cython迭代器类,它必须是cdef class
。否则你不能获得真正快速的课程。原因是只有cdef class
创建了一个直接实现tp_iternext
字段的扩展类型!我将使用IPythons %%cython
来编译代码(因此我不必包含设置):
%%cython
def test():
while True:
yield 1
class Test(object):
def __iter__(self):
return self
def __next__(self):
return 1
cdef class Test_cdef(object):
def __iter__(self):
return self
def __next__(self):
return 1
%timeit list(islice(test(), 1000))
# 113 µs ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit list(islice(Test(), 1000))
# 407 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit list(islice(Test_cdef(), 1000))
# 62.8 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
时间已经表明生成器和基本类比纯Python等价物快,但它们的相对性能大致保持不变。但是,cdef class
变体同时击败了它们,这主要是因为使用tp_iternext
广告位而不仅仅是实施__next__
方法。 (如果你不信任我,请检查Cython生成的C代码:))
然而,它只比Python生成器快2倍,这并不坏,但它并不完全是压倒性的。要获得真正惊人的加速,你需要找到一种方法来表达你的程序没有Python对象(Python对象越少,加速越快)。例如,如果您使用字典存储项目及其多重性,您仍然存储Python对象,并且任何查找都必须使用python字典方法完成 - 即使您可以通过C API函数调用它们而不必查看真正的方法:
%%cython
cpdef cython_count(items):
cdef dict res = dict()
for item in items:
if item in res:
res[item] += 1
else:
res[item] = 1
return res
import random
def count(items):
res = {}
for item in items:
if item in res:
res[item] += 1
else:
res[item] = 1
return res
l = [random.randint(0, 100) for _ in range(10000)]
%timeit cython_count(l)
# 2.06 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count(l)
# 3.63 ms ± 21.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这里有一个问题,你没有使用collections.Counter
来优化C代码(至少在python-3中)进行这种操作:
from collections import Counter
%timeit Counter(l)
# 1.17 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
此处快速说明:不要使用something in some_dict.keys()
因为keys()
在Python2中是列表式的,而ony工具O(n)
包含操作,而something in some_dict
通常是O(1)
(都是蟒蛇)!这将使两个版本的速度更快,尤其是在Python2上:
def count2(items):
res = {}
for item in items:
if item in res.keys(): # with "keys()"
res[item] += 1
else:
res[item] = 1
return res
# Python3
l = [random.randint(0, 100) for _ in range(10000)]
%timeit count(l)
# 3.63 ms ± 29 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count2(l)
# 5.9 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Python2
l = [random.randint(0, 10000) for _ in range(10000)]
%timeit count(l)
# 100 loops, best of 3: 4.59 ms per loop
%timeit count2(l)
# 1 loop, best of 3: 2.65 s per loop <--- WHOOPS!!!
这表明当你使用python结构时,你只能希望使用Cython(和C扩展)加速3-4倍,但即使是使用&#34; .keys()&#34;等小错误也是如此。如果使用不当,可能会在性能方面花费你更多。
那么如果你想要它更快,你会怎么做?答案相对简单:基于C类型而不是Python类型创建自己的数据结构。
这意味着你必须考虑设计:
uniqComb**
支持哪些类型?你想要整数(例子这样说,但我想你想要任意的Python对象)。uniqComb**
函数的对象可以排序?您使用了sorted
,但您也可以使用OrderedDict
并按照外观顺序而不是数值来保留密钥。这些问题的答案(这些只是我立即问自己的问题,可能还有更多!)可以帮助您决定内部可以使用哪种结构。例如,使用Cython,您可以连接到C ++,您可以使用包含整数键和整数值的map
而不是字典。它默认排序,因此您不需要自己手动对它们进行排序,而是使用本机整数而不是Python对象。但是你放弃了在uniqComb
中处理任意python对象的能力,你需要知道如何在Cython中使用C ++类型。它可能会非常快!
我不会走那条路,因为我假设你想要支持任意可订购的python类型,我坚持使用Counter
作为起点但是我将多重性保存为整数{{ 1}}而不是array.array
。我们称之为“最少侵入性”#34;优化。如果您使用list
或list
用于array
和lstCntRpts
因为它们不是瓶颈,那么它在性能方面实际上并不重要 - 但它更快一点,节省了一点内存和更重要的是它显示了如何在cython中包含同类multiplicities
:
array
你实际上并没有分享你的时间参数,但我和我的一些人一起尝试过:
%%cython
from cpython.list cimport PyList_Size # (most) C API functions can be used with cython!
from array import array
from collections import Counter
cdef class uniqCmboClassIter:
cdef list lstUniqs
cdef Py_ssize_t lenUniqs
cdef int[:] lstCntRpts # memoryview
cdef Py_ssize_t lenCmbo
cdef list cmboAsIdxUniqs
cdef int[:] multiplicities # memoryview
cdef Py_ssize_t idxIntoCmbo
cdef Py_ssize_t idxIntoUniqs
cdef bint stopIteration
cdef Py_ssize_t x
cdef Py_ssize_t y
def __init__(self, lstItems, lenCmbo):
dctCounter = Counter(lstItems)
self.lstUniqs = sorted(dctCounter)
self.lenUniqs = PyList_Size(self.lstUniqs)
self.lstCntRpts = array('i', [dctCounter[item] for item in self.lstUniqs])
self.lenCmbo = lenCmbo
self.cmboAsIdxUniqs = [None] * lenCmbo
self.multiplicities = array('i', [0] * self.lenUniqs)
self.idxIntoCmbo, self.idxIntoUniqs = 0, 0
while self.idxIntoCmbo != self.lenCmbo and self.idxIntoUniqs != self.lenUniqs:
count = min(self.lstCntRpts[self.idxIntoUniqs], self.lenCmbo-self.idxIntoCmbo)
self.cmboAsIdxUniqs[self.idxIntoCmbo : self.idxIntoCmbo + count] = [self.idxIntoUniqs] * count
self.multiplicities[self.idxIntoUniqs] = count
self.idxIntoCmbo += count
self.idxIntoUniqs += 1
# print("self.multiplicities:", self.multiplicities)
# print("self.cmboAsIdxUniqs:", self.cmboAsIdxUniqs)
if self.idxIntoCmbo != self.lenCmbo:
return
self.stopIteration = False
self.x = 0
self.y = 0
return
def __iter__(self):
return self
def __next__(self):
if self.stopIteration is True:
raise StopIteration
nextCmbo = tuple(self.lstUniqs[idxUniqs] for idxUniqs in self.cmboAsIdxUniqs)
for self.idxIntoCmbo in reversed(range(self.lenCmbo)):
self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
self.y = self.x + 1
if self.y < self.lenUniqs and self.multiplicities[self.y] < self.lstCntRpts[self.y]:
break
else:
self.stopIteration = True
return nextCmbo
for self.idxIntoCmbo in range(self.idxIntoCmbo, self.lenCmbo):
self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
self.cmboAsIdxUniqs[self.idxIntoCmbo] = self.y
self.multiplicities[self.x] -= 1
self.multiplicities[self.y] += 1
# print("# multiplicities:", multiplicities)
while self.y != self.lenUniqs and self.multiplicities[self.y] == self.lstCntRpts[self.y]:
self.y += 1
if self.y == self.lenUniqs:
break
return nextCmbo
from itertools import combinations import random import time def create_values(maximum): vals = [random.randint(0, maximum) for _ in range(48)] print('length: ', len(vals)) print('sorted values: ', sorted(vals)) print('uniques: ', len(set(vals))) print('uniques in percent: {:%}'.format(len(set(vals)) / len(vals))) return vals class Timer(object): def __init__(self): pass def __enter__(self): self._time = time.time() def __exit__(self, *args, **kwargs): print(time.time() - self._time) vals = create_values(maximum=50) # and 22 and 75 and 120 n = 6 with Timer(): list(combinations(vals, n)) with Timer(): list(uniqCmboClassIter(vals, n)) with Timer(): list(uniqCmboClassIterOriginal(vals, n)) with Timer(): list(uniqCmboYieldIterOriginal(vals, n))
它确实比原始方法执行得更好,实际上使用 类型声明的速度要快几倍。可能还有更多可以优化的东西(禁用边界检查,使用Python C API函数调用,使用无符号整数或更小的整数,如果你知道&#34;最大&#34;和&#34;最小&# 34;你的多重性,...) - 但事实上,即使对于80%的独特物品,它比length: 48
sorted values: [0, 0, 0, 1, 2, 2, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 9, 9, 10, 11, 11, 12, 12, 12, 13, 13, 14, 14, 14, 15, 15, 15, 17, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 22, 22]
uniques: 21
uniques in percent: 43.750000%
6.250450611114502
0.4217393398284912
4.250436305999756
2.7186365127563477
length: 48
sorted values: [1, 1, 2, 5, 6, 7, 7, 8, 8, 9, 11, 13, 13, 15, 16, 16, 16, 16, 17, 19, 19, 21, 21, 23, 24, 26, 27, 28, 28, 29, 31, 31, 34, 34, 36, 36, 38, 39, 39, 40, 41, 42, 44, 46, 47, 47, 49, 50]
uniques: 33
uniques in percent: 68.750000%
6.2034173011779785
4.343803882598877
42.39261245727539
26.65750527381897
length: 48
sorted values: [4, 4, 7, 9, 10, 14, 14, 17, 19, 21, 23, 24, 24, 26, 34, 36, 40, 42, 43, 43, 45, 46, 46, 52, 53, 58, 59, 59, 61, 63, 66, 68, 71, 72, 72, 75, 76, 80, 82, 82, 83, 84, 86, 86, 89, 92, 97, 99]
uniques: 39
uniques in percent: 81.250000%
6.859697341918945
10.437987327575684
104.12988543510437
65.25306582450867
length: 48
sorted values: [4, 7, 11, 19, 24, 29, 32, 36, 49, 49, 54, 57, 58, 60, 62, 65, 67, 70, 70, 72, 72, 79, 82, 83, 86, 89, 89, 90, 91, 94, 96, 99, 102, 111, 112, 118, 120, 120, 128, 129, 129, 134, 138, 141, 141, 144, 146, 147]
uniques: 41
uniques in percent: 85.416667%
6.484673023223877
13.610010623931885
136.28764533996582
84.73834943771362
慢得多,并且比任何原始实施都要快得多,这对我来说已经足够了。 : - )
答案 1 :(得分:1)
具有
__next__
版本的类是适合实现的类 作为Python扩展模块,因为没有相当于yield 在C中,因此找出如何按顺序改进它是有意义的 与yield变量的函数相当。
已经用C写了。您所看到的性能差异完全是由于Python实现的属性不适用于您计划编写的C扩展模块。可以应用于Python类的优化不适用于C代码。
例如,访问实例变量比访问Python代码中的局部变量更昂贵,因为实例变量访问需要多次dict查找。您的C实现不需要这样的dict查找。
答案 2 :(得分:1)
使用yield
编写生成器函数时,保存和恢复状态的开销由CPython内部(由C实现)处理。使用__iter__
/ __next__
,您必须在每次调用时管理保存和恢复状态。在CPython中,Python级别代码比C级内置函数慢,因此状态管理中涉及的extr Python级别代码(包括通过self
查找访问dict
的属性而不是加载本地的简单内容变量,只有数组索引开销)最终会花费你很多。
如果在C扩展模块中实现自己的迭代器协议支持类型,则可以绕过此开销;保存和恢复状态应该是几个C级变量访问的问题(与Python生成器函数相比,具有相似或更小的开销,也就是说,非常少)。实际上,是 的生成器函数是一种C扩展类型,它在每次调用tp_iternext
时保存并恢复Python帧(相当于__next__
的C级别)。