我有一个电影推荐数据集,想要将流派特征分成不同的流派列。该列包含所有类型,并用' |'。
分隔最好的方法是什么?
movieId title genres
0 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
1 2 Jumanji (1995) Adventure|Children|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama|Romance
4 5 Father of the Bride Part II (1995) Comedy
谢谢
答案 0 :(得分:4)
df = df['genres'].str.get_dummies('|')
print (df)
Adventure Animation Children Comedy Drama Fantasy Romance
0 1 1 1 1 0 1 0
1 1 0 1 0 0 1 0
2 0 0 0 1 0 0 1
3 0 0 0 1 1 0 1
4 0 0 0 1 0 0 0
如果需要在新列添加时添加join
:
df = df.join(df.pop('genres').str.get_dummies('|'))
print (df)
movieId title ... Fantasy Romance
0 1 Toy Story (1995) ... 1 0
1 2 Jumanji (1995) ... 1 0
2 3 Grumpier Old Men (1995) ... 0 1
3 4 Waiting to Exhale (1995) ... 0 1
4 5 Father of the Bride Part II (1995) ... 0 0