我描述了我的程序,超过80%的时间花在了这个单行函数上!我该如何优化它?我正在运行PyPy,所以我宁愿不使用NumPy,但由于我的程序几乎所有的时间花在那里,我认为放弃PyPy for NumPy可能是值得的。但是,我更喜欢使用CFFI,因为它与PyPy更兼容。
#x, y, are lists of 1s and 0s. c_out is a positive int. bit is 1 or 0.
def findCarryIn(x, y, c_out, bit):
return (2 * c_out +
bit -
sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))) #note this is basically a dot product.
答案 0 :(得分:1)
使用numpy肯定会加速很多。你可以定义你的函数:
def find_carry_numpy(x, y, c_out, bit):
return 2 * c_out + bit - np.sum(x & y[::-1])
创建一些随机数据:
In [36]: n = 100; c = 15; bit = 1
In [37]: x_arr = np.random.rand(n) > 0.5
In [38]: y_arr = np.random.rand(n) > 0.5
In [39]: x_list = list(x_arr)
In [40]: y_list = list(y_arr)
检查结果是否相同:
In [42]: find_carry_numpy(x_arr, y_arr, c, bit)
Out[42]: 10
In [43]: findCarryIn(x_list, y_list, c, bit)
Out[43]: 10
快速测试:
In [44]: timeit find_carry_numpy(x_arr, y_arr, c, bit)
10000 loops, best of 3: 19.6 µs per loop
In [45]: timeit findCarryIn(x_list, y_list, c, bit)
1000 loops, best of 3: 409 µs per loop
所以你的速度提高了20倍!将Python代码转换为Numpy时,这是一个非常典型的加速。
答案 1 :(得分:1)
在不使用Numpy的情况下,在使用timeit
进行测试后,最快的求和方法(你正在做的)似乎是使用简单的for循环和求和元素,例如 -
def findCarryIn(x, y, c_out, bit):
s = 0
for i,j in zip(x, reversed(y)):
s += i & j
return (2 * c_out + bit - s)
虽然这并没有大幅增加表现(可能是20%左右)。
时序测试的结果(使用不同的方法,func4
包含上述方法) -
def func1(x,y):
return sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))
def func2(x,y):
return sum([i & j for i,j in zip(x,reversed(y))])
def func3(x,y):
return sum(x[i] & y[-1-i] for i in range(min(len(x),len(y))))
def func4(x,y):
s = 0
for i,j in zip(x, reversed(y)):
s += i & j
return s
In [125]: %timeit func1(x,y)
100000 loops, best of 3: 3.02 µs per loop
In [126]: %timeit func2(x,y)
The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.9 µs per loop
In [127]: %timeit func3(x,y)
100000 loops, best of 3: 4.31 µs per loop
In [128]: %timeit func4(x,y)
100000 loops, best of 3: 2.2 µs per loop