在多列数据框熊猫中连续计数元素

时间:2020-07-08 21:09:07

标签: python pandas counting one-hot-encoding

嗨,我必须计算一个病人每天要吃多少药。病人每天服用几种药物,数量不同。 初始数据如下:

df_data={'med1':['Prednisolone','Prednisolone','Folic acid','Folic acid','Prednisolone','Enbrel','Prednisolone'],
    'med2': [np.nan, np.nan, 'Folic acid','Folic acid',np.nan,'Methotrexate pill',np.nan],
    'med3':[np.nan, np.nan,'Prednisolone','Prednisolone',np.nan,'Prednisolone',np.nan]}

df_data=pd.DataFrame(df_data)
df_data

    med1            med2        med3
------------------------------------------
0   Prednisolone    NaN         NaN
1   Prednisolone    NaN         NaN
2   Folic acid  Folic acid      Prednisolone
3   Folic acid  Folic acid      Prednisolone
4   Prednisolone    NaN         NaN
5   Enbrel  Methotrexate pill   Prednisolone
6   Prednisolone    NaN         NaN

我想要得到的是为每种药物创建新列的计数。我希望它看起来像这样:

    med1       med2    med3             Prednisolone Folic acid Enbrel  Methotrexate pill
---------------------------------------------------------------------------------
0   Prednisolone        NaN       NaN              1       0       0        0
1   Prednisolone        NaN       NaN              1       0       0        0
2   Folic acid    Folic acid    Prednisolone       1       2       0.       0
3   Folic acid    Folic acid    Prednisolone       1       2       0        0
4   Prednisolone        NaN       NaN              1       0       1        1
5   Enbrel  Methotrexate pill   Prednisolone       1       0       1        1
6   Prednisolone       NaN        NaN              1       0       0        0

我不知道如何进行。每列进行一次热编码,然后求和?有更简单的建议吗?

1 个答案:

答案 0 :(得分:1)

我们可以做stack + str.get_dummies

s=df_data.stack().str.get_dummies().sum(level=0)
   Enbrel  Folic acid  Methotrexate pill  Prednisolone
0       0           0                  0             1
1       0           0                  0             1
2       0           2                  0             1
3       0           2                  0             1
4       0           0                  0             1
5       1           0                  1             1
6       0           0                  0             1
df=df.join(s)