我有一个列表如下:
data = [1,5,9,13,
2,6,10,14,
3,7,11,15,
4,8,12,16]
我想制作以下元组列表,并分别计算每个元组的平均值。
[(1,5,2,6), (3,7,4,8), (9,13,10,14), (11,15,12,16)]
预期结果应为:
[3.5, 5.5, 11.5, 13.5]
更简单的方法是什么?
答案 0 :(得分:2)
这是一种方法
In [29]: a = np.array(data)
In [30]: a2 = a.reshape(4,4)
In [31]: a3 = np.vstack((a2[:, :2], a2[:, 2:]))
In [32]: a4 = a3.reshape(4,4)
In [33]: np.mean(a4, axis=1)
Out[33]: array([ 3.5, 5.5, 11.5, 13.5])
答案 1 :(得分:2)
将数据放入具有形状(2,2,2,2)的4维numpy数组中,然后在轴1和3上取该数组的平均值:
In [25]: data
Out[25]: [1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15, 4, 8, 12, 16]
In [26]: a = np.array(data).reshape(2, 2, 2, 2)
In [27]: a
Out[27]:
array([[[[ 1, 5],
[ 9, 13]],
[[ 2, 6],
[10, 14]]],
[[[ 3, 7],
[11, 15]],
[[ 4, 8],
[12, 16]]]])
In [28]: a.mean(axis=(1, 3))
Out[28]:
array([[ 3.5, 11.5],
[ 5.5, 13.5]])
如果您需要将最终结果作为一维数组,则可以使用ravel()
方法:
In [31]: a.mean(axis=(1, 3)).ravel()
Out[31]: array([ 3.5, 11.5, 5.5, 13.5])
有关类似问题,请参阅How can I vectorize the averaging of 2x2 sub-arrays of numpy array?。
答案 2 :(得分:1)
本文中列出了一些解决方案建议 -
def grouped_mean(data,M2,N1,N2):
# Paramters:
# M2 = Columns in input data
# N1, N2 = Blocksize into which data is to be divided and averaged
# Get grouped mean values; transpose and flatten for final output
grouped_mean = np.array(data).reshape(-1,N2).sum(1).reshape(-1,N1,M2/N2).sum(1)/(N1*N2)
# Return transposed and flattened version as output (as per OP)
return grouped_mean.T.ravel()
现在,grouped_mean
可以使用np.einsum
代替np.sum
来计算 -
stage1_sum = np.einsum('ij->i',np.array(data).reshape(-1,N2))
grouped_mean = np.einsum('ijk->ik',stage1_sum.reshape(-1,N1,M2/N2))/(N1*N2)
或者,可以按照@Warren Weckesser's solution
中的建议将2D输入数组拆分为4D数组,然后像这样使用np.einsum
-
split_data = np.array(data).reshape(-1, N1, M2/N2, N2)
grouped_mean = np.einsum('ijkl->ik',split_data)/(N1*N2)
示例运行 -
In [182]: data = np.array([[1,5,9,13],
...: [2,6,10,14],
...: [3,7,11,15],
...: [4,8,12,16]])
In [183]: grouped_mean(data,4,2,2)
Out[183]: array([ 3.5, 5.5, 11.5, 13.5])
运行时测试
计算grouped_mean
似乎是代码中计算密集程度最高的部分。所以,这里有一些运行时测试用这三种方法来计算它 -
In [174]: import numpy as np
...: # Setup parameters and input list
...: M2 = 4000
...: N1 = 2
...: N2 = 2
...: data = np.random.randint(0,9,(16000000)).tolist()
...:
In [175]: %timeit np.array(data).reshape(-1,N2).sum(1).reshape(-1,N1,M2/N2).sum(1)/(N1*N2)
...: %timeit np.einsum('ijk->ik',np.einsum('ij->i',np.array(data).reshape(-1,N2)).reshape(-1,N1,M2/N2))/(N1*N2)
...: %timeit np.einsum('ijkl->ik',np.array(data).reshape(-1, N1, M2/N2, N2))/(N1*N2)
...:
1 loops, best of 3: 2.2 s per loop
1 loops, best of 3: 2.12 s per loop
1 loops, best of 3: 2.1 s per loop