熊猫集团按价值频率

时间:2020-10-28 00:17:55

标签: python pandas pandas-groupby

我有这组样本数据

STATE   CAPSULES     LIQUID         TABLETS  
Alabama NaN          Prescription   OTC
Georgia Prescription NaN            OTC
Texas   OTC          OTC            NaN
Texas   Prescription NaN            NaN
Florida NaN          Prescription   OTC
Georgia OTC          Prescription   Prescription
Texas   Prescription NaN            OTC
Alabama NaN          OTC            OTC
Georgia OTC          NaN            NaN

我尝试了多种groupby配置以获得以下理想结果:

State   capsules_OTC    capsules_prescription   liquid_OTC  liquid_prescription tablets_OTC tablets_prescription
Alabama    0             0                         0              0               0           0
Florida    0             0                         0              0               0           0
Georgia    1             1                         1              1               1           1
Texas      1             2                         2              2               2           2

例如,尝试此

df.groupby(['STATE','CAPSULES'])

尝试至少使第一列发生争执,不要掷骰子。也许这不是一个简单的答案,但是我发现groupby可能缺少一些简单的东西,也许count()或其他一些apply函数呢?

1 个答案:

答案 0 :(得分:4)

pd.get_dummiesgroupbysum一起使用:

pd.get_dummies(df, columns=['CAPSULES', 'LIQUID', 'TABLETS'])\
  .groupby('STATE', as_index=False).sum()

输出:

     STATE  CAPSULES_OTC  CAPSULES_Prescription  LIQUID_OTC  LIQUID_Prescription  TABLETS_OTC  TABLETS_Prescription
0  Alabama             0                      0           1                    1            2                     0
1  Florida             0                      0           0                    1            1                     0
2  Georgia             2                      1           0                    1            1                     1
3    Texas             1                      2           1                    0            1                     0