我有一个数据框,如下图所示。我目前有2列显示某人遭受的伤害,其旁边的列代表该人错过的比赛(如果他们因受伤错过了比赛,则为1;如果他们没有因受伤而错过比赛,则没有受伤)。在第三列中,我希望汇总球员因受伤缺席的比赛次数,而不是Nan。因此,您可以看到玩家在第一周发生脑震荡并错过了比赛,但第二周没有发生。因此,由于脑震荡,他缺席了1场比赛。我希望该行看起来像这样:
Injury Game Missed Games Missed Due To Injury
Concussion 1 (Concussion,1)
Concussion 0 (Concussion,1)
No Injury No Injury Nan
Shoulder 1 (Shoulder,1)
Shoulder No Injury (Shoulder,1)
Shoulder 1 (Shoulder,2)
Shoulder 1 (Shoulder,3)
我如何在熊猫中实现这一目标?
谢谢!
答案 0 :(得分:1)
在使用Series.groupby
和cumsum
之前,先使用pd.to_numeric
计算Series.fillna
。
将“无伤害”替换为“ 0”,以便转换为整数(Series.astype
)以计算累积总和。
计算完总和后,将其转换为str,然后使用Series.str.cat
将其加入“伤害”列:
missed=pd.to_numeric(df['Game Missed'],errors='coerce').fillna(0).astype(int).groupby(df['Injury']).cumsum()
df['Games Missed Due To Injury']=( missed.astype(str)
.str.cat(df['Injury'],sep=',')
.mask(df['Injury'].str.contains('No')) )
print(df)
Injury Game Missed Games Missed Due To Injury
0 Concussion 1 1,Concussion
1 Concussion 0 1,Concussion
2 No Injury No Injury NaN
3 Shoulder 1 1,Shoulder
4 Shoulder No Injury 1,Shoulder
5 Shoulder 1 2,Shoulder
6 Shoulder 1 3,Shoulder
注意,您可以使用所需的面罩:
df['Injury'].str.contains('No')
df['Injury'].eq('No Injury')
df['Injury'].str.contains('No Injury',case=False)