我有一个这样的DataFrame:
`id` `text`
1 Hello world how are you
2 Hello people I am fine
3 Good Morning
4 Good Evening
我想接受每个单词,并为它们创建不同的列。它们将仅包含两个值1或0(1表示文本中存在单词,0表示否)
预期输出:
`id` `text` Hello world how are you people I am fine Good Morning Evening
1 Hello world how are you 1 1 1 1 1 1 1 1 1 1 1 1
2 Hello people I am fine 1 0 0 0 0 1 1 1 1 0 0 0
3 Good Morning 0 0 0 0 0 0 0 0 0 1 1 0
4 Good Evening 0 0 0 0 0 0 0 0 0 1 0 1
答案 0 :(得分:3)
这是get_dummies
pd.concat([df,df.text.str.get_dummies(' ')],axis=1)
答案 1 :(得分:2)
将DataFrame.join
与Series.str.get_dummies
一起使用:
df1 = df.join(df.text.str.get_dummies(sep=' '))