我有这组样本数据
STATE CAPSULES LIQUID TABLETS
Alabama NaN Prescription OTC
Georgia Prescription NaN OTC
Texas OTC OTC NaN
Texas Prescription NaN NaN
Florida NaN Prescription OTC
Georgia OTC Prescription Prescription
Texas Prescription NaN OTC
Alabama NaN OTC OTC
Georgia OTC NaN NaN
我尝试了多种groupby配置以获得以下理想结果:
State capsules_OTC capsules_prescription liquid_OTC liquid_prescription tablets_OTC tablets_prescription
Alabama 0 0 0 0 0 0
Florida 0 0 0 0 0 0
Georgia 1 1 1 1 1 1
Texas 1 2 2 2 2 2
例如,尝试此
df.groupby(['STATE','CAPSULES'])
尝试至少使第一列发生争执,不要掷骰子。也许这不是一个简单的答案,但是我发现groupby可能缺少一些简单的东西,也许count()或其他一些apply函数呢?
答案 0 :(得分:4)
将pd.get_dummies
与groupby
和sum
一起使用:
pd.get_dummies(df, columns=['CAPSULES', 'LIQUID', 'TABLETS'])\
.groupby('STATE', as_index=False).sum()
输出:
STATE CAPSULES_OTC CAPSULES_Prescription LIQUID_OTC LIQUID_Prescription TABLETS_OTC TABLETS_Prescription
0 Alabama 0 0 1 1 2 0
1 Florida 0 0 0 1 1 0
2 Georgia 2 1 0 1 1 1
3 Texas 1 2 1 0 1 0