我想在熊猫中执行分组操作。例如,我想对patient
列进行分组,并且如果treatment
列== X
将对应的doctor
值转移到名为nurse
的新列中。
例如:df
import pandas as pd
import numpy as np
df = pd.DataFrame({'patient': ['a','a','a','b','b','b'],
....: 'treatment': ['X','Y','Y','X','Z','Z'],
'doctor': ['1','2','2','2','3','3']})
patient treatment doctor
0 a X 1
1 a Y 2
2 a Y 2
3 b X 2
4 b Z 3
5 b Z 3
我尝试过
df=df.assign(nurse=np.where(df.['treatment'].str.contains('X'),df.groupby('patient')['doctor'], np.nan))
但出现错误
SyntaxError:语法无效
预期输出
patient treatment doctor nurse
0 a X 1 1
1 a Y 2 1
2 a Y 2 1
3 b X 2 2
4 b Z 3 2
5 b Z 3 2
如何获得此输出?
thx
答案 0 :(得分:3)
使用DataFrame.apply + Series.where。然后塞满ffill:
df['nurse']=df.groupby('patient',sort=False).apply(lambda x: x['doctor'].where(x['treatment'].eq('X')).ffill()).reset_index(drop=True)
print(df)
patient treatment doctor nurse
0 a X 1 1
1 a Y 2 1
2 a Y 2 1
3 b X 2 2
4 b Z 3 2
5 b Z 3 2