目前,我有一个像这样的表
ID Previous_Injuries Currently_Injured Injury_Type
1 Nan 0 Nan
1 Nan 1 Ankle
1 Nan 0 Nan
1 Nan 1 Wrist
1 Nan 0 Nan
1 Nan 1 Leg
1 Nan 0 Nan
2 Nan 1 Leg
2 Nan 0 Nan
我想添加到“以前的伤害”栏中,并使我的表格看起来像这样:
ID Previous_Injuries Currently_Injured Injury_Type
1 Nan 0 Nan
1 Nan 1 Ankle
1 [Ankle] 0 Nan
1 [Ankle] 1 Wrist
1 [Ankle,Wrist] 0 Nan
1 [Ankle,Wrist] 1 Leg
1 [Ankle,Wrist,Leg] 0 Nan
2 Nan 1 Leg
2 [Leg] 0 Nan
如何在熊猫中实现此类专栏?并且最好以列表的形式进行吗?
谢谢!
答案 0 :(得分:4)
我们可以用shift
做cumsum
,然后用split
字符串,请注意,这里您使用的是Nan
(字符串类型),而不是{{1} }
np.nan
再次更改问题!
s=df.Injury_Type.shift().fillna('Nan').add(',').cumsum().str[:-1].str.split(',')
df['new']=[[y for y in x if y != 'Nan'] for x in s ]
df
Out[322]:
ID Previous_Injuries Currently_Injured Injury_Type new
0 1 Nan 0 Nan []
1 1 Nan 1 Ankle []
2 1 Nan 0 Nan [Ankle]
3 1 Nan 1 Wrist [Ankle]
4 1 Nan 0 Nan [Ankle, Wrist]
5 1 Nan 1 Leg [Ankle, Wrist]
6 1 Nan 0 Nan [Ankle, Wrist, Leg]
答案 1 :(得分:3)
使用:
df['Previous_Injuries']=( df['Injury_Type'].replace('Nan',np.nan).fillna(' ')
.cumsum().shift(fill_value='')
.str.split() )
print(df)
如果NaN不是str ,则可以省略 使用replace('Nan', np.nan)
ID Previous_Injuries Currently_Injured Injury_Type
0 1 [] 0 Nan
1 1 [] 1 Ankle
2 1 [Ankle] 0 Nan
3 1 [Ankle] 1 Wrist
4 1 [Ankle, Wrist] 0 Nan
5 1 [Ankle, Wrist] 1 Leg
6 1 [Ankle, Wrist, Leg] 0 Nan
DataFrame.groupby
作为差异ID df['Previous_Injuries']=( df.groupby('ID')['Injury_Type']
.apply(lambda x: x.replace('Nan',np.nan).fillna(' ')
.cumsum().shift(fill_value='')
.str.split()) )
print(df)
ID Previous_Injuries Currently_Injured Injury_Type
0 1 [] 0 Nan
1 1 [] 1 Ankle
2 1 [Ankle] 0 Nan
3 1 [Ankle] 1 Wrist
4 1 [Ankle, Wrist] 0 Nan
5 1 [Ankle, Wrist] 1 Leg
6 1 [Ankle, Wrist, Leg] 0 Nan
7 2 [] 1 Leg
8 2 [Leg] 0 Nan