我试图通过取平均值来将numpy数组分组为更小的数组。例如,在100x100阵列中取平均foreach 5x5子阵列以创建20x20大小的阵列。由于我需要操作大量数据,这是一种有效的方法吗?
答案 0 :(得分:26)
我已经尝试过更小的数组,所以请用你的测试:
import numpy as np
nbig = 100
nsmall = 20
big = np.arange(nbig * nbig).reshape([nbig, nbig]) # 100x100
small = big.reshape([nsmall, nbig//nsmall, nsmall, nbig//nsmall]).mean(3).mean(1)
6x6的示例 - > 3×3:
nbig = 6
nsmall = 3
big = np.arange(36).reshape([6,6])
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
small = big.reshape([nsmall, nbig//nsmall, nsmall, nbig//nsmall]).mean(3).mean(1)
array([[ 3.5, 5.5, 7.5],
[ 15.5, 17.5, 19.5],
[ 27.5, 29.5, 31.5]])
答案 1 :(得分:4)
这很简单,虽然我觉得它可能会更快:
from __future__ import division
import numpy as np
Norig = 100
Ndown = 20
step = Norig//Ndown
assert step == Norig/Ndown # ensure Ndown is an integer factor of Norig
x = np.arange(Norig*Norig).reshape((Norig,Norig)) #for testing
y = np.empty((Ndown,Ndown)) # for testing
for yr,xr in enumerate(np.arange(0,Norig,step)):
for yc,xc in enumerate(np.arange(0,Norig,step)):
y[yr,yc] = np.mean(x[xr:xr+step,xc:xc+step])
您可能还会发现scipy.signal.decimate很有趣。在对数据进行下采样之前,它应用比简单平均更复杂的低通滤波器,尽管你必须抽取一个轴,然后另一个轴。
答案 2 :(得分:2)
在大小为NxN的子阵列上平均2D数组:
height, width = data.shape
data = average(split(average(split(data, width // N, axis=1), axis=-1), height // N, axis=1), axis=-1)
答案 3 :(得分:0)
请注意,eumiro's approach不适用于屏蔽数组,因为.mean(3).mean(1)
假设沿轴3的每个平均值都是根据相同数量的值计算的。如果数组中有掩码元素,则此假设不再适用。在这种情况下,您必须跟踪用于计算.mean(3)
的值的数量,并用加权平均值替换.mean(1)
。权重是用于计算.mean(3)
的标准化值。
以下是一个例子:
import numpy as np
def gridbox_mean_masked(data, Nbig, Nsmall):
# Reshape data
rshp = data.reshape([Nsmall, Nbig//Nsmall, Nsmall, Nbig//Nsmall])
# Compute mean along axis 3 and remember the number of values each mean
# was computed from
mean3 = rshp.mean(3)
count3 = rshp.count(3)
# Compute weighted mean along axis 1
mean1 = (count3*mean3).sum(1)/count3.sum(1)
return mean1
# Define test data
big = np.ma.array([[1, 1, 2],
[1, 1, 1],
[1, 1, 1]])
big.mask = [[0, 0, 0],
[0, 0, 1],
[0, 0, 0]]
Nbig = 3
Nsmall = 1
# Compute gridbox mean
print gridbox_mean_masked(big, Nbig, Nsmall)