Question

我试图了解如何最好地利用numpy数组的C顺序来编写高性能代码。我的期望是遍历行的操作应该比遍历列的操作更快。事实上，我尝试的第一个例子就是这样：

X = np.ones((10000,10000),dtype='int64')
print(X.dtype)
print(X.flags)

%timeit np.sum(X,axis=0)

%timeit np.sum(X,axis=1)

这会产生输出：

int64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
10 loops, best of 3: 79.6 ms per loop
10 loops, best of 3: 61.1 ms per loop

这是我的预期，因为沿着行的求和应该比沿列的求和更快。

这是我非常困惑的地方。如果我将dtype更改为float64，那么列操作几乎是行操作的两倍：

X = np.ones((10000,10000),dtype='float')
print(X.dtype)
print(X.flags)

%timeit np.sum(X,axis=0)

%timeit np.sum(X,axis=1)

产生输出：

float64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
10 loops, best of 3: 67.7 ms per loop
10 loops, best of 3: 123 ms per loop

有人可以澄清为什么会这样吗？

编辑：评论中建议我再次使用较小的矩阵（1000,1000）。我跑的时候：

import time
import numpy as np

X = np.ones((1000,1000),dtype='float')
print(X.dtype)
print(X.flags)

%timeit np.sum(X,axis=0)
%timeit np.sum(X,axis=1)

X = np.ones((1000,1000),dtype='int64')
print(X.dtype)
print(X.flags)

%timeit np.sum(X,axis=0)
%timeit np.sum(X,axis=1)

我得到输出：

float64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
1000 loops, best of 3: 598 µs per loop
1000 loops, best of 3: 1.06 ms per loop
int64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
1000 loops, best of 3: 788 µs per loop
1000 loops, best of 3: 632 µs per loop

因此效果持续存在。

Answer 1

我无法确认您在OSX上的第二个结果（各种Python版本） - 它与您的第一个结果类似：

Is there any other way to replace u.total - u.used in case WHEN u.total - u.used > 50 THEN 'higher than 50' ELSE 'lower than 50' END based its front u.total - u.used

编辑：我直接使用u.total - u.used重复了所有计算：

u.total - u.used

有了这些时间：

case WHEN u.total - u.used > 50 THEN 'higher than 50' ELSE 'lower than 50' END

和

case WHEN u.total - u.used > 50 THEN 'higher than 50' ELSE 'lower than 50' END

和

u.total - u.used

最后，在我的Android手机上：

In [27]: X = np.ones((10000,10000),dtype='float64')
    ...: print(X.dtype)
    ...: print(X.flags)
    ...: 
    ...: %timeit np.sum(X,axis=0)
    ...: 
    ...: %timeit np.sum(X,axis=1)
    ...: 
float64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
10 loops, best of 3: 67.6 ms per loop
10 loops, best of 3: 62 ms per loop

和Windows系统（python 3.4 32bit）：

timeit.repeat()

dtype如何影响Numpy中的行和列操作速度？

1 个答案: