我有一个熊猫数据框,如下所示:
import pandas as pd
df=pd.DataFrame(data=np.random.rand(10,5),columns=['blue','white','red','green','purple'])
df['group_labels']=['a','a','b','c','b','c','a','c','b','b']
我想按“ group_labels”分组,计算平均值,然后在新的数据框中显示(平均值+-标准差的平均值)。所以基本上我想拥有:
mean_df=df.groupby('group_labels').mean().reset_index()
但是,在每个单元格中,我还需要显示
+- std deviation of the group / sqrt(size of the group)
有可能吗?
答案 0 :(得分:3)
我相信您需要DataFrameGroupBy.agg
以及由std
创建的自定义功能,默认ddof
是1
:
np.random.seed(2019)
df=pd.DataFrame(data=np.random.rand(10,5),columns=['blue','white','red','green','purple'])
df['group_labels']=['a','a','b','c','b','c','a','c','b','b']
def func(x):
return x.std() / len(x)**(1/2)
替代:
def func(x):
return x.std() / np.sqrt(len(x))
df1 = df.groupby('group_labels').agg(['mean', func])
print (df1)
blue white red \
mean func mean func mean func
group_labels
a 0.450134 0.174723 0.401106 0.214163 0.417548 0.009156
b 0.532030 0.185240 0.595667 0.174218 0.496617 0.150546
c 0.552874 0.247173 0.382590 0.099883 0.571595 0.222161
green purple
mean func mean func
group_labels
a 0.786139 0.156584 0.525661 0.234515
b 0.505838 0.215673 0.653970 0.114664
c 0.653841 0.132705 0.587994 0.111854
要删除列中的MultiIndex
,请使用:
df1.columns = df1.columns.map('_'.join)
print (df1)
blue_mean blue_func white_mean white_func red_mean \
group_labels
a 0.702381 0.201604 0.679590 0.159292 0.743523
b 0.386550 0.057390 0.418805 0.126278 0.306843
c 0.636310 0.269986 0.385225 0.240675 0.451133
red_func green_mean green_func purple_mean purple_func
group_labels
a 0.083068 0.788519 0.075999 0.738081 0.16673
b 0.093714 0.792748 0.071369 0.465246 0.15333
c 0.217406 0.293735 0.108021 0.549472 0.17632