大熊猫,分组和聚合

时间:2014-08-31 21:17:26

标签: python pandas

我已将数据框分组:

rwp_initial.df.loc[rwp_initial.df.sample_name=='sma_initial'].groupby(by=['sample_name','pH','salt','column'])['concentration'].plot(marker = 'o', rot=30)

并获得以下输出:

sample_name  pH   salt  column
sma_initial  5.7  50    5         Axes(0.125,0.125;0.775x0.775)
                        6         Axes(0.125,0.125;0.775x0.775)
                  100   7         Axes(0.125,0.125;0.775x0.775)
                        8         Axes(0.125,0.125;0.775x0.775)
                  200   9         Axes(0.125,0.125;0.775x0.775)
                        10        Axes(0.125,0.125;0.775x0.775)
                  400   11        Axes(0.125,0.125;0.775x0.775)
                        12        Axes(0.125,0.125;0.775x0.775)

enter image description here

我想在每个pH值和盐浓度范围内取平均值。这些列只是测量两次的相同样品。如果我使用aggregate(np.mean),则计算一列的所有数据点的平均值。

这个数字可能会突出显示我想要取平均值的数据点(我希望沿着行平均):

rwp_initial.df.loc[rwp_initial.df.sample_name=='sma_initial'].groupby(by=['sample_name','pH','salt'])['concentration'].plot(marker = 'o', rot=30)

1 个答案:

答案 0 :(得分:0)

好的,我找到了答案:

grp_initial = rwp_initial.df.loc[rwp_initial.df.sample_name=='sma_initial'].groupby(by=['sample_name','pH','salt']).concentration

for grp, val in grp_initial:
    print(val.groupby(level='row').aggregate(np.mean))

作品