根据流派分开列

时间:2018-05-17 14:32:21

标签: python pandas scikit-learn

我有一个电影推荐数据集,想要将流派特征分成不同的流派列。该列包含所有类型,并用' |'。

分隔

最好的方法是什么?

     movieId      title                                genres
0       1         Toy Story (1995)                     Adventure|Animation|Children|Comedy|Fantasy
1       2         Jumanji (1995)                       Adventure|Children|Fantasy
2       3         Grumpier Old Men (1995)              Comedy|Romance
3       4         Waiting to Exhale (1995)             Comedy|Drama|Romance
4       5         Father of the Bride Part II (1995)   Comedy

谢谢

1 个答案:

答案 0 :(得分:4)

使用str.get_dummies

df = df['genres'].str.get_dummies('|')
print (df)
   Adventure  Animation  Children  Comedy  Drama  Fantasy  Romance
0          1          1         1       1      0        1        0
1          1          0         1       0      0        1        0
2          0          0         0       1      0        0        1
3          0          0         0       1      1        0        1
4          0          0         0       1      0        0        0

如果需要在新列添加时添加join

df = df.join(df.pop('genres').str.get_dummies('|'))
print (df)
   movieId                               title   ...     Fantasy  Romance
0        1                    Toy Story (1995)   ...           1        0
1        2                      Jumanji (1995)   ...           1        0
2        3             Grumpier Old Men (1995)   ...           0        1
3        4            Waiting to Exhale (1995)   ...           0        1
4        5  Father of the Bride Part II (1995)   ...           0        0