如何基于其他2列汇总熊猫列的计数

时间:2019-11-07 18:35:58

标签: python pandas

我有一个数据框,如下图所示。我目前有2列显示某人遭受的伤害,其旁边的列代表该人错过的比赛(如果他们因受伤错过了比赛,则为1;如果他们没有因受伤而错过比赛,则没有受伤)。在第三列中,我希望汇总球员因受伤缺席的比赛次数,而不是Nan。因此,您可以看到玩家在第一周发生脑震荡并错过了比赛,但第二周没有发生。因此,由于脑震荡,他缺席了1场比赛。我希望该行看起来像这样:

Injury       Game Missed    Games Missed Due To Injury
Concussion       1                (Concussion,1)
Concussion       0                (Concussion,1)
No Injury        No Injury        Nan
Shoulder         1                (Shoulder,1)
Shoulder         No Injury        (Shoulder,1)
Shoulder         1                (Shoulder,2)
Shoulder         1                (Shoulder,3)

我如何在熊猫中实现这一目标?

谢谢!

1 个答案:

答案 0 :(得分:1)

在使用Series.groupbycumsum之前,先使用pd.to_numeric计算Series.fillna。 将“无伤害”替换为“ 0”,以便转换为整数(Series.astype)以计算累积总和。 计算完总和后,将其转换为str,然后使用Series.str.cat将其加入“伤害”列:

 missed=pd.to_numeric(df['Game Missed'],errors='coerce').fillna(0).astype(int).groupby(df['Injury']).cumsum()  
df['Games Missed Due To Injury']=( missed.astype(str)
                                         .str.cat(df['Injury'],sep=',')
                                         .mask(df['Injury'].str.contains('No')) )
print(df)

       Injury Game Missed Games Missed Due To Injury
0  Concussion           1               1,Concussion
1  Concussion           0               1,Concussion
2   No Injury   No Injury                        NaN
3    Shoulder           1                 1,Shoulder
4    Shoulder   No Injury                 1,Shoulder
5    Shoulder           1                 2,Shoulder
6    Shoulder           1                 3,Shoulder

注意,您可以使用所需的面罩:

df['Injury'].str.contains('No')
df['Injury'].eq('No Injury')
df['Injury'].str.contains('No Injury',case=False)