我有一个格式为的数据框:
id amenities ...
1 "TV,Internet,Shower,..." ...
2 "TV,Hot tub,Internet,..." ...
3 "Internet,Heating,Shower..." ...
...
我想分割关于逗号的字符串并为每个类别创建虚拟列,结果如下:
id TV Internet Shower Hot tub Heating ...
1 1 1 1 0 0 ...
2 1 1 0 1 0 ...
3 0 1 1 0 1 ...
...
我将如何做到这一点?
由于
答案 0 :(得分:2)
您可以将get_dummies
与join
或concat
:
df = df[['id']].join(df['amentieis'].str.get_dummies(','))
print (df)
id Heating Hot tub Internet Shower TV
0 1 0 0 1 1 1
1 2 0 1 1 0 1
2 3 1 0 1 1 0
或者:
df = pd.concat([df['id'], df['amentieis'].str.get_dummies(',')], axis=1)
print (df)
id Heating Hot tub Internet Shower TV
0 1 0 0 1 1 1
1 2 0 1 1 0 1
2 3 1 0 1 1 0