我有一个熊猫数据框
age gender criticality acknowledged
10 Male High Yes
10 Male High Yes
10 Male High Yes
10 Male Low Yes
11 Female Medium No
我想按年龄和性别分组,然后将“临界”,“确认”的值作为新列并获得计数。
例如,我想要的输出是:
criticality acknowledged
age gender High Medium Low Yes No
10 Male 3 0 1 4 0
11 Female 0 1 0 0 1
我考虑过使用df.groupby(['age','gender'])['criticality','acknowledged'].stack()
但是它不起作用。
是否有更好的方法来获取这种格式的结果
答案 0 :(得分:1)
由于您要分别计算这两列,因此concat是一个简单的解决方案:
In [13]: pd.concat([df.pivot_table(index=['age', 'gender'], columns=col, aggfunc
...: =len) for col in ['criticality', 'acknowledged']], axis=1).fillna(0)
Out[13]:
acknowledged criticality
criticality High Low Medium No Yes
age gender
10 Male 3.0 1.0 0.0 0.0 4.0
11 Female 0.0 0.0 1.0 1.0 0.0
答案 1 :(得分:1)
在get_dummies()
之后将assigning
与groupby()
一起使用的另一种方法,最后用expand=True
拆分多索引的列:
l=['criticality','acknowledged']
final=df[['age','gender']].assign(**pd.get_dummies(df[l])).groupby(['age','gender']).sum()
final.columns=final.columns.str.split('_',expand=True)
print(final)
criticality acknowledged
High Low Medium No Yes
age gender
10 Male 3 1 0 0 4
11 Female 0 0 1 1 0