Question

我试图在Python中为Problem 12 (Project Euler)编写解决方案。解决方案太慢了，所以我尝试在互联网上检查其他人的解决方案。我发现用C ++编写的this code与我的python代码几乎完全相同，只有几个微不足道的差异。

的Python：

def find_number_of_divisiors(n):
    if n == 1:
        return 1

    div = 2 # 1 and the number itself
    for i in range(2, n/2 + 1):
        if (n % i) == 0:
            div += 1
    return div

def tri_nums():
    n = 1
    t = 1
    while 1:
        yield t
        n += 1
        t += n

t = tri_nums()
m = 0
for n in t:
    d = find_number_of_divisiors(n)
    if m < d:
        print n, ' has ', d, ' divisors.'
        m = d

    if m == 320:
        exit(0)

C ++：

#include <iostream>

int main(int argc, char *argv[])
{
    unsigned int iteration = 1;
    unsigned int triangle_number = 0;
    unsigned int divisor_count = 0;
    unsigned int current_max_divisor_count = 0;
    while (true) {
        triangle_number += iteration;
        divisor_count = 0;
        for (int x = 2; x <= triangle_number / 2; x ++) {
            if (triangle_number % x == 0) {
                divisor_count++;
            }
        }
        if (divisor_count > current_max_divisor_count) {
            current_max_divisor_count = divisor_count;
            std::cout << triangle_number << " has " << divisor_count
                      << " divisors." << std::endl;
        }
        if (divisor_count == 318) {
            exit(0);
        }

        iteration++;
    }
    return 0;
}

我的机器上的python代码需要1分钟和25.83秒才能执行。而C ++代码大约需要4.628秒。它比18倍快。我曾经期望C ++代码更快，但不是很大，而且仅仅是一个简单的解决方案，它只包含2个循环和一堆增量和mod。

虽然我很欣赏如何解决这个问题的答案，但我想问的主要问题是为什么C ++代码要快得多？我在python中使用/做错了吗？

用xrange替换范围：

用xrange替换范围后，python代码大约需要1分11.48秒才能执行。（大约快1.2倍）

Answer 1

与Python相比，这正是C ++将要发挥作用的一种代码：一个相当紧凑的循环来进行算术运算。（我将在这里忽略算法加速，因为你的C ++代码使用相同的算法，而且你似乎明确没有要求......）

C ++将这种代码编译成相对较少数量的处理器指令（它所做的一切都可能适用于超高速CPU缓存），而Python有很多层次的间接处理通过每个操作。例如，每次增加一个数字时，它都会检查该数字是否只是溢出并需要移动到更大的数据类型中。

那说，一切都不一定丢失！这也是像PyPy这样的即时编译器系统能够很好地执行的代码，因为一旦它经过循环几次，它就会将代码编译成与C ++代码启动类似的东西。在。在我的笔记本电脑上：

$ time python2.7 euler.py >/dev/null
python euler.py  72.23s user 0.10s system 97% cpu 1:13.86 total

$ time pypy euler.py >/dev/null                       
pypy euler.py > /dev/null  13.21s user 0.03s system 99% cpu 13.251 total

$ clang++ -o euler euler.cpp && time ./euler >/dev/null
./euler > /dev/null  2.71s user 0.00s system 99% cpu 2.717 total

使用xrange代替range的Python代码版本。优化级别对我来说对C ++代码没有影响，也没有使用GCC代替Clang。

虽然我们在这里，但这也是Cython可以做得很好的情况，它将几乎Python代码编译为使用Python API的C代码，但在可能的情况下使用原始C。如果我们通过添加一些类型声明来更改你的代码，并删除迭代器，因为我不知道如何在Cython中有效地处理它们，得到

cdef int find_number_of_divisiors(int n):
    cdef int i, div
    if n == 1:
        return 1

    div = 2 # 1 and the number itself
    for i in xrange(2, n/2 + 1):
        if (n % i) == 0:
            div += 1
    return div

cdef int m, n, t, d
m = 0
n = 1
t = 1
while True:
    n += 1
    t += n
    d = find_number_of_divisiors(t)
    if m < d:
        print n, ' has ', d, ' divisors.'
        m = d

    if m == 320:
        exit(0)

然后在我的笔记本电脑上我得到了

$ time python -c 'import euler_cy' >/dev/null
python -c 'import euler_cy' > /dev/null  4.82s user 0.02s system 98% cpu 4.941 total

（在C ++代码的2倍内）。

Answer 2

重写除数计数算法以使用divisor function会使运行时间减少到不到1秒。它仍然可以使它更快，但不是真的必要。

这表明：在使用语言功能和编译器进行任何优化技巧之前，应该检查算法是否是瓶颈。编译器/解释器的技巧确实非常强大，正如Dougal的回答所示，Python和C ++之间的差距因等效代码而异。但是，正如您所看到的，算法的更改立即提供了巨大的性能提升，并将运行时间降低到算法效率低下的C ++代码水平（我没有测试C ++版本，但在我6岁的计算机上），下面的代码在~0.6s内完成运行。

下面的代码是用Python 3.2.3编写和测试的。

import math

def find_number_of_divisiors(n):
    if n == 1:
        return 1

    num = 1

    count = 1
    div = 2
    while (n % div == 0):
        n //= div
        count += 1

    num *= count

    div = 3
    while (div <= pow(n, 0.5)):
        count = 1
        while n % div == 0:
            n //= div
            count += 1

        num *= count
        div += 2

    if n > 1:
        num *= 2

    return num

Answer 3

这是我自己的变体，建立在nhahtdh的因子计数优化和我自己的素因子化代码之上：

def prime_factors(x):
    def factor_this(x, factor):
        factors = []
        while x % factor == 0:
            x /= factor
            factors.append(factor)
        return x, factors
    x, factors = factor_this(x, 2)
    x, f = factor_this(x, 3)
    factors += f
    i = 5
    while i * i <= x:
        for j in (2, 4):
            x, f = factor_this(x, i)
            factors += f
            i += j
    if x > 1:
        factors.append(x)
    return factors

def product(series):
    from operator import mul
    return reduce(mul, series, 1)

def factor_count(n):
    from collections import Counter
    c = Counter(prime_factors(n))
    return product([cc + 1 for cc in c.values()])

def tri_nums():
    n, t = 1, 1
    while 1:
        yield t
        n += 1
        t += n

if __name__ == '__main__':
    m = 0
    for n in tri_nums():
        d = factor_count(n)
        if m < d:
            print n, ' has ', d, ' divisors.'
            m = d
            if m == 320:
                break

几乎相同的C ++和Python代码的执行时间差异非常大

3 个答案: