嗨,我必须计算一个病人每天要吃多少药。病人每天服用几种药物,数量不同。 初始数据如下:
df_data={'med1':['Prednisolone','Prednisolone','Folic acid','Folic acid','Prednisolone','Enbrel','Prednisolone'],
'med2': [np.nan, np.nan, 'Folic acid','Folic acid',np.nan,'Methotrexate pill',np.nan],
'med3':[np.nan, np.nan,'Prednisolone','Prednisolone',np.nan,'Prednisolone',np.nan]}
df_data=pd.DataFrame(df_data)
df_data
med1 med2 med3
------------------------------------------
0 Prednisolone NaN NaN
1 Prednisolone NaN NaN
2 Folic acid Folic acid Prednisolone
3 Folic acid Folic acid Prednisolone
4 Prednisolone NaN NaN
5 Enbrel Methotrexate pill Prednisolone
6 Prednisolone NaN NaN
我想要得到的是为每种药物创建新列的计数。我希望它看起来像这样:
med1 med2 med3 Prednisolone Folic acid Enbrel Methotrexate pill
---------------------------------------------------------------------------------
0 Prednisolone NaN NaN 1 0 0 0
1 Prednisolone NaN NaN 1 0 0 0
2 Folic acid Folic acid Prednisolone 1 2 0. 0
3 Folic acid Folic acid Prednisolone 1 2 0 0
4 Prednisolone NaN NaN 1 0 1 1
5 Enbrel Methotrexate pill Prednisolone 1 0 1 1
6 Prednisolone NaN NaN 1 0 0 0
我不知道如何进行。每列进行一次热编码,然后求和?有更简单的建议吗?
答案 0 :(得分:1)
我们可以做stack
+ str.get_dummies
s=df_data.stack().str.get_dummies().sum(level=0)
Enbrel Folic acid Methotrexate pill Prednisolone
0 0 0 0 1
1 0 0 0 1
2 0 2 0 1
3 0 2 0 1
4 0 0 0 1
5 1 0 1 1
6 0 0 0 1
df=df.join(s)