我在Python的numpy中找到了一些有趣的东西。 ma.average
比arr.mean
(arr是一个数组)慢很多
>>> arr = np.full((3, 3), -9999, dtype=float)
array([[-9999., -9999., -9999.],
[-9999., -9999., -9999.],
[-9999., -9999., -9999.]])
%timeit np.ma.average(arr, axis=0)
The slowest run took 49.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 191 µs per loop
%timeit arr.mean(axis=0)
The slowest run took 6.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.41 µs per loop
随机数
arr = np.random.random((3,3))
%timeit arr.mean(axis=0)
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.78 µs per loop
%timeit np.ma.average(arr, axis=0)
1000 loops, best of 3: 186 µs per loop
- >那差了近24倍。
numpy.ma.average(a, axis=None, weights=None, returned=False)
返回给定轴上的
weighted
平均值。
numpy.mean(a, axis=None, dtype=None, out=None, keepdims)
计算沿指定轴的算术平均值。
为什么ma.average
比arr.mean
慢得多?在数学上它们是相同的(如果我错了,请纠正我)。我的猜测是,它与ma.average
上的加权选项有关,但如果没有传递权重则不应该有后备?
答案 0 :(得分:2)
找出原因较慢的一个好方法是对其进行分析。我将在此处使用第三方库line_profiler
和IPython命令%lprun
(请参阅示例this blog):
%load_ext line_profiler
import numpy as np
arr = np.full((3, 3), -9999, dtype=float)
%lprun -f np.ma.average np.ma.average(arr, axis=0)
Line # Hits Time Per Hit % Time Line Contents
==============================================================
519 def average(a, axis=None, weights=None, returned=False):
...
570 1 1810 1810.0 30.5 a = asarray(a)
571 1 15 15.0 0.3 m = getmask(a)
572
573 # inspired by 'average' in numpy/lib/function_base.py
574
575 1 5 5.0 0.1 if weights is None:
576 1 3500 3500.0 59.0 avg = a.mean(axis)
577 1 591 591.0 10.0 scl = avg.dtype.type(a.count(axis))
578 else:
...
608
609 1 7 7.0 0.1 if returned:
610 if scl.shape != avg.shape:
611 scl = np.broadcast_to(scl, avg.shape).copy()
612 return avg, scl
613 else:
614 1 5 5.0 0.1 return avg
我删除了一些不相关的行。
所以实际上有30%的时间花在np.ma.asarray
上(arr.mean
不必做的事情!)。
但是,如果使用更大的数组,相对时间会发生剧烈变化:
arr = np.full((1000, 1000), -9999, dtype=float)
%lprun -f np.ma.average np.ma.average(arr, axis=0)
Line # Hits Time Per Hit % Time Line Contents
==============================================================
519 def average(a, axis=None, weights=None, returned=False):
...
570 1 609 609.0 7.6 a = asarray(a)
571 1 14 14.0 0.2 m = getmask(a)
572
573 # inspired by 'average' in numpy/lib/function_base.py
574
575 1 7 7.0 0.1 if weights is None:
576 1 6924 6924.0 86.9 avg = a.mean(axis)
577 1 404 404.0 5.1 scl = avg.dtype.type(a.count(axis))
578 else:
...
609 1 6 6.0 0.1 if returned:
610 if scl.shape != avg.shape:
611 scl = np.broadcast_to(scl, avg.shape).copy()
612 return avg, scl
613 else:
614 1 6 6.0 0.1 return avg
这次np.ma.MaskedArray.mean
功能几乎占用了90%的时间。
注意:您还可以深入挖掘并查看np.ma.asarray
或np.ma.MaskedArray.count
或np.ma.MaskedArray.mean
并检查其线路轮廓。但我只想表明有很多被调用的函数会增加开销。
接下来的问题是:np.ndarray.mean
和np.ma.average
之间的相对时间是否也发生了变化?至少在我的电脑上,差异现在要低得多:
%timeit np.ma.average(arr, axis=0)
# 2.96 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit arr.mean(axis=0)
# 1.84 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
这次它甚至没有慢2倍。我假设对于更大的阵列,差异会变得更小。
这也是NumPy实际上很常见的事情:
即使对于普通的numpy函数,常数因子也很高(例如,请参阅我对问题"Performance in different vectorization method in numpy"的回答)。对于np.ma
,这些常数因素甚至更大,特别是如果您不使用np.ma.MaskedArray
作为输入。但即使常数因素可能很高,这些功能也很适合大阵列。
答案 1 :(得分:0)
感谢上面评论中的@WillemVanOnsem和@sascha
编辑:适用于小型数组,请参阅已接受的答案以获取更多信息
屏蔽操作很慢,为了避免它:
mask = self.local_pos_history[:, 0] > -9
local_pos_hist_masked = self.local_pos_history[mask]
avg = local_pos_hist_masked.mean(axis=0)
陈旧蒙面
mask = np.ma.masked_where(self.local_pos_history > -9, self.local_pos_history)
local_pos_hist_mask = self.local_pos_history[mask].reshape(len(self.local_pos_history) // 3, 3)
avg_pos = self.local_pos_history
np.average几乎等于arr.mean:
%timeit np.average(arr, axis=0)
The slowest run took 5.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.89 µs per loop
%timeit np.mean(arr, axis=0)
The slowest run took 6.44 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.74 µs per loop
仅用于澄清小批量的测试