Question

这是Project Euler Problem 49的一个（略显凌乱）尝试。

我应该直截了当地说deque不是一个好选择！我的想法是缩小质数集来测试成员资格会导致循环加速。然而，当我意识到我应该使用set（而不用担心删除元素）时，我的速度提高了60倍。

from collections import deque
from itertools import permutations
from .sieve import sieve_of_erastothenes  # my own implementation of the Sieve of Erastothenes

primes = deque(prime for prime in sieve_of_erastothenes(10000) if prime > 1000 and prime != 1487)  # all four-digit primes except 1487
try:
    while True:
        prime = primes.popleft()  # decrease the length of primes each time to speed up membership test
        for inc in xrange(1,10000 + 1 - (2 * prime)):  # this limit ensures we don't end up with results > 10000
            inc1 = prime + inc
            inc2 = prime + 2*inc

            if inc1 in primes and inc2 in primes:
                primestr = str(prime)
                perms = set(''.join(tup) for tup in permutations(primestr))  # because permutations() returns tuples
                inc1str = str(inc1)
                inc2str = str(inc2)
                if inc1str in perms and inc2str in perms:
                    print primestr + inc1str + inc2str
                    raise IOError  # I chose IOError because it's unlikely to be raised
                                   # by anything else in the block. Exceptions are an easy
                                   # way to break out of nested loops.
except IOError:
    pass

无论如何，在我考虑使用set之前，我在Pypy尝试过。我发现结果相当令人惊讶：

$ time python "problem49-deque.py"
296962999629

real    1m3.429s
user    0m49.779s
sys 0m0.335s

$ time pypy-c "problem49-deque.py"
296962999629

real    5m52.736s
user    5m15.608s
sys 0m1.509s

为什么Pypy在这段代码上慢了五倍？我猜想Pypy版本的deque是罪魁祸首（因为它在set版本上运行得更快），但我不知道为什么会这样。

Answer 1

缓慢的部分是inc1 in primes and inc2 in primes。我会看看为什么PyPy这么慢（感谢性能错误报告，基本上）。请注意，正如您所提到的，代码可以变得非常快（在PyPy和CPython上） - 在这种情况下，只需将primes deque复制到for循环之前的集合中。

Answer 2

您应该期望deque中的成员资格测试（具有python性能特征）会很慢，因为列表中的任何成员资格测试都涉及线性扫描。相比之下，set是针对成员资格测试优化的数据结构。从这个意义上说，这里没有错误。

为什么Pypy的deque这么慢？

2 个答案: