优化python one-liner

时间:2015-07-26 16:51:37

标签: python

我描述了我的程序,超过80%的时间花在了这个单行函数上!我该如何优化它?我正在运行PyPy,所以我宁愿不使用NumPy,但由于我的程序几乎所有的时间花在那里,我认为放弃PyPy for NumPy可能是值得的。但是,我更喜欢使用CFFI,因为它与PyPy更兼容。

#x, y, are lists of 1s and 0s. c_out is a positive int. bit is 1 or 0.
def findCarryIn(x, y, c_out, bit):

    return (2 * c_out +
            bit -
            sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))) #note this is basically a dot product.

2 个答案:

答案 0 :(得分:1)

使用numpy肯定会加速很多。你可以定义你的函数:

def find_carry_numpy(x, y, c_out, bit):
    return 2 * c_out + bit - np.sum(x & y[::-1])

创建一些随机数据:

In [36]: n = 100; c = 15; bit = 1

In [37]: x_arr = np.random.rand(n) > 0.5

In [38]: y_arr = np.random.rand(n) > 0.5

In [39]: x_list = list(x_arr)

In [40]: y_list = list(y_arr)

检查结果是否相同:

In [42]: find_carry_numpy(x_arr, y_arr, c, bit)
Out[42]: 10

In [43]: findCarryIn(x_list, y_list, c, bit)
Out[43]: 10

快速测试:

In [44]: timeit find_carry_numpy(x_arr, y_arr, c, bit)
10000 loops, best of 3: 19.6 µs per loop

In [45]: timeit findCarryIn(x_list, y_list, c, bit)
1000 loops, best of 3: 409 µs per loop

所以你的速度提高了20倍!将Python代码转换为Numpy时,这是一个非常典型的加速。

答案 1 :(得分:1)

在不使用Numpy的情况下,在使用timeit进行测试后,最快的求和方法(你正在做的)似乎是使用简单的for循环和求和元素,例如 -

def findCarryIn(x, y, c_out, bit):
    s = 0
    for i,j in zip(x, reversed(y)):
        s += i & j
    return (2 * c_out + bit - s)

虽然这并没有大幅增加表现(可能是20%左右)。

时序测试的结果(使用不同的方法,func4包含上述方法) -

def func1(x,y):
    return sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))

def func2(x,y):
    return sum([i & j for i,j in zip(x,reversed(y))])

def func3(x,y):
    return sum(x[i] & y[-1-i] for i in range(min(len(x),len(y))))

def func4(x,y):
    s = 0
    for i,j in zip(x, reversed(y)):
        s += i & j
    return s

In [125]: %timeit func1(x,y)
100000 loops, best of 3: 3.02 µs per loop

In [126]: %timeit func2(x,y)
The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.9 µs per loop

In [127]: %timeit func3(x,y)
100000 loops, best of 3: 4.31 µs per loop

In [128]: %timeit func4(x,y)
100000 loops, best of 3: 2.2 µs per loop