我正在设法弄清为什么我的代码无法使用以下示例数据使用dummie值访问特定的列值:
df
shop category subcategory season
date
2013-09-04 abc weddings shoes winter
2013-09-04 def jewelry watches summer
2013-09-05 ghi sports sneakers spring
2013-09-05 jkl jewelry necklaces fall
这是我的基本代码:
wedding_df = df[["weddings","winter","summer","spring","fall"]]
我在笔记本电脑上使用Python 2,因此很可能是版本问题,需要get_dummies()
,但是一些指导会有所帮助。想法是创建一个虚拟数据框,该数据框使用二进制数据来说明某行是否具有婚礼类别以及什么季节。
这是我要实现的目标的一个示例:
weddings winter summer spring fall
71654 1.0 0.0 1.0 0.0 0.0
72168 1.0 0.0 1.0 0.0 0.0
72080 1.0 0.0 1.0 0.0 0.0
与corr()
:
weddings fall spring summer winter
weddings NaN NaN NaN NaN NaN
fall NaN 1.000000 0.054019 -0.331866 -0.012122
spring NaN 0.054019 1.000000 -0.857205 0.072420
summer NaN -0.331866 -0.857205 1.000000 -0.484578
winter NaN -0.012122 0.072420 -0.484578 1.000000
答案 0 :(得分:1)
您可以尝试使用prefix
并将prefix_sep
分配为blank,然后您就可以df[["weddings","winter","summer","spring","fall"]]
df = pd.get_dummies(df,prefix = '', prefix_sep = '' )
df
abc def ghi jkl jewelry sports weddings necklaces shoes \
date
2013-09-04 1 0 0 0 0 0 1 0 1
2013-09-04 0 1 0 0 1 0 0 0 0
2013-09-05 0 0 1 0 0 1 0 0 0
2013-09-05 0 0 0 1 1 0 0 1 0
sneakers watches fall spring summer winter
date
2013-09-04 0 0 0 0 0 1
2013-09-04 0 1 0 0 1 0
2013-09-05 1 0 0 1 0 0
2013-09-05 0 0 1 0 0 0
更新
pd.get_dummies(df.loc[df['category']=='weddings',['category','season']],prefix = '', prefix_sep = '' )
Out[820]:
weddings winter
date
2013-09-04 1 1