在M. O'Neill's great paper之后,我尝试在Python中实现一些懒惰的无限版Sierat of Eratosthenes。我惊讶地发现,基于堆的版本,论文声称应该运行得更快,实际上对我来说实际上要慢两倍。
本文包含两个例子,一个基于dict,我已经翻译过(来自Haskell):
from itertools import count
def dict_sieve():
yield 2
yield 3
candidates = count(5, 2)
composites = {9:{3}} # map composites to their prime factors
for candidate in candidates:
try:
factors = composites.pop(candidate)
except KeyError: # if it's not in the dict, it's prime
yield candidate
composites[candidate**2] = {candidate} # Euler's optimization: start from prime**2
else:
for prime in factors: # go through the prime factors and increment their keys
try:
composites[candidate+prime*2].add(prime) # use prime*2 because we are ignoring evens
except KeyError:
composites[candidate+prime*2] = {prime}
本文中的第二个示例演示了如何使用优先级队列作为数据结构。它也使用惰性列表,而不是简单的增量,我没有为了公平测试而做。 (此外,我使用itertools.count
的实例作为我的懒惰列表,我发现它运行得稍慢一点。)
from itertools import count
from heapq import heappush, heapreplace
def heap_sieve():
yield 2
yield 3
candidates = count(5,2)
composites = [(9, 3)] # a priority queue of composite/factor pairs, keyed by composite
for candidate in candidates:
prime_flag = True
while composites[0][0] == candidate: # loop because there may be duplicates
prime_flag = False # signal to the if statement below
composite, prime = composites[0]
heapreplace(composites, (composite + prime*2, prime))
if prime_flag:
yield candidate
heappush(composites, (candidate**2, candidate))
我将这两个版本与一个“渴望”版本一起计时,这里没有复制,它会生成一个低于限制的所有素数列表:
In [44]: from itertools import islice
In [45]: %timeit list(islice(dict_sieve(), 100000))
...: %timeit list(islice(heap_sieve(), 100000))
...: %timeit eager_sieve(1299710) # 1299709 is the 100,000th prime
...:
1 loops, best of 3: 2.12 s per loop
1 loops, best of 3: 4.94 s per loop
1 loops, best of 3: 677 ms per loop
“渴望”版本的速度要快得多也就不足为奇了 - 它基本上是在内存使用,必须指定上限的不便和CPU时间之间进行权衡。但是,当论文声称它更有效时,我确实发现heapq
版本慢得多,这令人惊讶。这是我的实施问题吗?或者就是说,正如我们所知,dicts超快(并且heapq
相对较慢)?
答案 0 :(得分:7)
实际上,应该期望基于字典的方法比基于堆队列的方法更快。堆插入和替换操作是O(log n),而字典插入和替换操作是O(1)。
事实上,我很惊讶地发现该报的作者声称不是这样。但实际情况并非如此。您假设Data.Map
是作为哈希映射实现的,但实际上,它是size balanced binary tree。因此其性能特征与堆队列的性能特征非常相似。不同之处在于从堆队列中检索最小密钥是O(1),这会加速部分筛选代码 - 但哈希映射仍然更快。