我正在尝试使用python的熊猫的.describe()创建摘要表。
我有以下数据框:
df = pd.DataFrame({'Group':['Group1', 'Group1', 'Group1', 'Group2', 'Group2', 'Group2', 'Group3', 'Group3', 'Group4'],
'Cat':['Cat1', 'Cat2', 'Cat3', 'Cat4', 'Cat5', 'Cat', 'Cat7', 'Cat8', 'Cat9'],
'Value':[1230,4019,9491,9588,6402,1923,492,8589,8582]})
df
Group Cat Value
0 Group1 Cat1 1230
1 Group1 Cat2 4019
2 Group1 Cat3 9491
3 Group2 Cat4 9588
4 Group2 Cat5 6402
5 Group2 Cat 1923
6 Group3 Cat7 492
7 Group3 Cat8 8589
8 Group4 Cat9 8582
我想生成一个按组和猫分组的摘要表,所有不在组中的猫都以相同的方式出现,所有值均为0。
我正在尝试:
df.groupby(['Group', 'Cat']).describe()
# That has the following output:
Value
count mean std min 25% 50% 75% max
Group Cat
Group1 Cat1 1.0 1230.0 NaN 1230.0 1230.0 1230.0 1230.0 1230.0
Cat2 1.0 4019.0 NaN 4019.0 4019.0 4019.0 4019.0 4019.0
Cat3 1.0 9491.0 NaN 9491.0 9491.0 9491.0 9491.0 9491.0
Group2 Cat 1.0 1923.0 NaN 1923.0 1923.0 1923.0 1923.0 1923.0
Cat4 1.0 9588.0 NaN 9588.0 9588.0 9588.0 9588.0 9588.0
Cat5 1.0 6402.0 NaN 6402.0 6402.0 6402.0 6402.0 6402.0
Group3 Cat7 1.0 492.0 NaN 492.0 492.0 492.0 492.0 492.0
Cat8 1.0 8589.0 NaN 8589.0 8589.0 8589.0 8589.0 8589.0
Group4 Cat9 1.0 8582.0 NaN 8582.0 8582.0 8582.0 8582.0 8582.0
但是我想要的输出是:
Value
count mean std min 25% 50% 75% max
Group Cat
Group1 Cat1 1.0 1230.0 NaN 1230.0 1230.0 1230.0 1230.0 1230.0
Cat2 1.0 4019.0 NaN 4019.0 4019.0 4019.0 4019.0 4019.0
Cat3 1.0 9491.0 NaN 9491.0 9491.0 9491.0 9491.0 9491.0
Cat4 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat5 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat6 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat7 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat8 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat9 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Group2 Cat 1.0 1923.0 NaN 1923.0 1923.0 1923.0 1923.0 1923.0
Cat1 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat2 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat3 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat4 1.0 9588.0 NaN 9588.0 9588.0 9588.0 9588.0 9588.0
Cat5 1.0 6402.0 NaN 6402.0 6402.0 6402.0 6402.0 6402.0
Cat6 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat7 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat8 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat9 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Group3 Cat1 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat2 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat3 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat4 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat5 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat6 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat7 1.0 492.0 NaN 492.0 492.0 492.0 492.0 492.0
Cat8 1.0 8589.0 NaN 8589.0 8589.0 8589.0 8589.0 8589.0
Cat9 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Group4 Cat1 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat2 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat3 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat4 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat5 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat6 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat7 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat8 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
Cat9 1.0 8582.0 NaN 8582.0 8582.0 8582.0 8582.0 8582.0
我想知道如何获得此输出。
答案 0 :(得分:3)
用unstack
+ stack
进行检查,请注意,我还建议将行值设置为NaN
,不要用0填充
out = df.groupby(['Group', 'Cat']).describe().unstack().stack(dropna=False)
答案 1 :(得分:3)
您还可以根据获取的索引和reindex
创建笛卡尔乘积索引列表:
out = df.groupby(['Group', 'Cat']).describe()
idx = pd.MultiIndex.from_product((out.index.levels[0],out.index.levels[1]))
out = out.reindex(idx,fill_value=0)
Value
count mean std min 25% 50% 75% max
Group1 Cat 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat1 1.0 1230.0 NaN 1230.0 1230.0 1230.0 1230.0 1230.0
Cat2 1.0 4019.0 NaN 4019.0 4019.0 4019.0 4019.0 4019.0
Cat3 1.0 9491.0 NaN 9491.0 9491.0 9491.0 9491.0 9491.0
Cat4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Group2 Cat 1.0 1923.0 NaN 1923.0 1923.0 1923.0 1923.0 1923.0
Cat1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat4 1.0 9588.0 NaN 9588.0 9588.0 9588.0 9588.0 9588.0
Cat5 1.0 6402.0 NaN 6402.0 6402.0 6402.0 6402.0 6402.0
Cat7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Group3 Cat 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Cat1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
....................................
...............................