Question

我有一个这样的DataFrame：

`id` `text`
1     Hello world how are you
2     Hello people I am fine
3     Good Morning
4     Good Evening

我想接受每个单词，并为它们创建不同的列。它们将仅包含两个值1或0（1表示文本中存在单词，0表示否）

预期输出：

`id` `text`                   Hello  world how are you people I am fine Good Morning Evening
1     Hello world how are you   1      1    1   1   1    1    1  1   1    1      1      1
2     Hello people I am fine    1      0    0   0   0    1    1  1   1    0      0      0
3     Good Morning              0      0    0   0   0    0    0  0   0    1      1      0
4     Good Evening              0      0    0   0   0    0    0  0   0    1      0      1

Answer 1

这是get_dummies

pd.concat([df,df.text.str.get_dummies(' ')],axis=1)

Answer 2

将DataFrame.join与Series.str.get_dummies一起使用：

df1 = df.join(df.text.str.get_dummies(sep=' '))

如何为熊猫中的每个字符串单词创建一列

2 个答案: