Question

结果是大小为300000的2d numpy数组

for i in range(np.size(results,0)):  
     if results[i][0]>=0.7:  
        count+=1

在此python代码中花了我0.7秒，但是我在C ++代码中运行了它，花了不到0.07秒。
那么如何使这个python代码尽可能快呢？

Answer 1

尝试

first_col=results[:,0]
res =len(first_col[first_col>.7])

根据矩阵的形状，这可能比您的方法快2-10倍。

Answer 2

您可以尝试以下方法：

np.bincount(results[:,0]>=.7)[1]

不确定它会更快，但是应该会给出正确的答案

Answer 3

进行速度数值计算时，尤其是在Python中，如果可能，您永远不要使用for循环。 Numpy针对“矢量化”计算进行了优化，因此您希望将通常在for循环中所做的工作传递给特殊的numpy索引和类似where的函数。

我对300,000 x 600随机值从0到1的数组进行了快速测试，发现了以下内容。

您的代码，没有一个for循环的向量：
每次运行226毫秒

%%timeit
count = 0
for i in range(np.size(n,0)):  
     if results[i][0]>=0.7:  
        count+=1

emilaz解决方案：
每次运行8.36毫秒

%%timeit
first_col = results[:,0]
x = len(first_col[first_col>.7])

Ethan的解决方案：
每次运行7.84毫秒

%%timeit
np.bincount(results[:,0]>=.7)[1]

最好是我想到的
每次运行6.92毫秒

%%timeit
len(np.where(results[:,0] > 0.7)[0])

所有4种方法均得出相同的答案，对我的数据为90,134。希望这会有所帮助！