Question

我有一个numpy数组：

[[1 3 1]
 [8 9 0]
 [1 3 1]
 [8 4 1]
 [5 1 0]]

我希望为每个第3列的值生成第0列的小计（计数，总和，平均值）。它可以直接在numpy中完成，还是我必须循环整个数组？

Answer 1

如果第三列中没有多个值，则可以对每个值执行此操作（假设您的数组为data）：

np.mean(data[data[:,2] == 1], axis = 0)
np.sum(data[data[:,2] == 1], axis = 0)

否则，您可以循环第三列中的不同值。

Answer 2

使用pandas（http://pandas.sourceforge.net/）可以做到这一点

In [35]: from pandas import DataMatrix

In [36]: dm = DataMatrix(a)

In [37]: dm
Out[37]: 
     0           1           2           
0    1           3           1          
1    8           9           0          
2    1           3           1          
3    8           4           1          
4    5           1           0          

In [38]: dm.groupby(dm[2]).sum()
Out[38]: 
     0           1           2           
0    13          10          0          
1    10          10          3          


In [39]: dm.groupby(dm[2]).mean()
Out[39]: 
     0           1           2           
0    6.5         5           0          
1    3.333       3.333       1          

In [48]: dm[2].groupby(dm[2]).agg(len)
Out[48]: 
0    2
1    3

但这可能有点过分=）（更多关于groupby：http://pandas.sourceforge.net/groupby.html）

Answer 3

您可以使用numpy.histogram()：

counts = numpy.histogram(data[:,2], bins=range(3))[0]
sums0 = numpy.histogram(data[:,2], bins=range(3), weights=data[:,0])[0]
sums1 = numpy.histogram(data[:,2], bins=range(3), weights=data[:,1])[0]

必须调整

bins以反映第三列中出现的值。

Answer 4

首先，计算元素与最后一列之间的差异：

check = data[:,:2]-data[:,2].reshape((-1,1))

然后，您可以在每行中将检查元素的总和等于零

sum(check==0, axis=1)

这将返回一个长度为“行数”的数组，其中包含前两列中最后一列值的出现次数。

然而，你想要的并不完全清楚。

如果您只想对数据进行求和和平均，则ndarrays上的sum，mean和std方法允许您“按轴”执行：

data[:,:2].mean(axis=1)
data[:,:2].sum(axis=1)

numpy中的小计

4 个答案: