Question

在Wikipedia中，这是生成素数的给定算法之一：

def eratosthenes_sieve(n):
    # Create a candidate list within which non-primes will be
    # marked as None; only candidates below sqrt(n) need be checked. 
    candidates = [i for i in range(n + 1)]
    fin = int(n ** 0.5)

    # Loop over the candidates, marking out each multiple.
    for i in range(2, fin + 1):
        if not candidates[i]:
            continue

        candidates[i + i::i] = [None] * (n // i - 1)

    # Filter out non-primes and return the list.
    return [i for i in candidates[2:] if i]

我稍微改变了算法。

def eratosthenes_sieve(n):
    # Create a candidate list within which non-primes will be
    # marked as None; only candidates below sqrt(n) need be checked. 
    candidates = [i for i in range(n + 1)]
    fin = int(n ** 0.5)

    # Loop over the candidates, marking out each multiple.

    candidates[4::2] = [None] * (n // 2 - 1)

    for i in range(3, fin + 1, 2):
        if not candidates[i]:
            continue

        candidates[i + i::i] = [None] * (n // i - 1)

    # Filter out non-primes and return the list.
    return [i for i in candidates[2:] if i]

我首先标记了2的所有倍数，然后我只考虑了奇数。当我计算两种算法（尝试40.000.000）时，第一种算法总是更好（虽然非常轻微）。我不明白为什么。有人可以解释一下吗？

P.S。：当我尝试100.000.000时，我的电脑冻结了。这是为什么？我有Core Duo E8500,4GB RAM，Windows 7 Pro 64 Bit。

更新1：这是Python 3。

更新2：这是我定时的方式：

start = time.time()
a = eratosthenes_sieve(40000000)
end = time.time()
print(end - start)

更新：有价值的评论（特别是夜间工作者和Winston Ewert）我设法编写了我想要的内容：

def eratosthenes_sieve(n):
    # Create a candidate list within which non-primes will be
    # marked as None; only c below sqrt(n) need be checked. 
    c = [i for i in range(3, n + 1, 2)]
    fin = int(n ** 0.5) // 2

    # Loop over the c, marking out each multiple.

    for i in range(fin):
        if not c[i]:
            continue

        c[c[i] + i::c[i]] = [None] * ((n // c[i]) - (n // (2 * c[i])) - 1)

    # Filter out non-primes and return the list.
    return [2] + [i for i in c if i]

该算法通过（通常）50％改进原始算法（在顶部提到）。（仍然，比夜间人提到的算法更糟糕，自然而然）。

Python大师的一个问题：是否有更多Pythonic方式以更“功能”的方式表达最后一个代码？

更新2：我仍然无法解码nightcracker提到的算法。我想我太傻了。

Answer 1

问题是，为什么它会更快？在这两个例子中，你都是过滤两倍的倍数。无论您是硬编码candidates[4::2] = [None] * (n // 2 - 1)还是在for i in range(2, fin + 1):的第一个循环中执行它都无关紧要。

如果您对优化的Eratosthenes筛子感兴趣，请点击此处：

def primesbelow(N):
    # https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    #""" Input N>=6, Returns a list of primes, 2 <= p < N """
    correction = N % 6 > 1
    N = (N, N-1, N+4, N+3, N+2, N+1)[N%6]
    sieve = [True] * (N // 3)
    sieve[0] = False
    for i in range(int(N ** .5) // 3 + 1):
        if sieve[i]:
            k = (3 * i + 1) | 1
            sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
            sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
    return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]

此处的说明：Porting optimized Sieve of Eratosthenes from Python to C++

原始来源是here，但没有解释。简而言之，这个primesieve跳过2和3的倍数并使用一些hack来使用快速的Python赋值。

Answer 2

你不会节省很多时间来避免平衡。算法中的大部分计算时间用于此：

candidates[i + i::i] = [None] * (n // i - 1)

该行会导致计算机上的很多操作。每当有问题的数字是偶数时，就不会在if语句的循环保释中运行。因此，为偶数运行循环所花费的时间实际上非常小。因此，消除那些偶数轮并不会在循环的时间上产生显着的变化。这就是为什么你的方法不会快得多。

当python为范围生成数字时，它使用公式：start + index * step。在您的情况下乘以2将比原始情况稍微贵一些。

具有更长功能的开销也很小。

这些都不是真正重要的速度问题，但它们会覆盖您的版本带来的极少量的好处。

Answer 3

它可能稍微慢一点，因为你正在执行额外的设置以执行在第一种情况下完成的事情（标记两个的倍数）。设置时间可能就是你所看到的，如果它像你说的那样轻微

Answer 4

您的额外步骤是不必要的，并且实际上将遍历整个集合n一旦执行'摆脱evens'操作而不是仅仅操作n ^ 1/2。

为什么这个算法更糟？

4 个答案: