我有一个Python Pandas数据框如下:
movie unknown action adventure animation fantasy horror romance sci-fi
Toy Story 0 1 1 0 1 0 0 1
Golden Eye 0 1 0 0 0 0 1 0
Four Rooms 1 0 0 0 0 0 0 0
Get Shorty 0 0 0 1 1 0 1 0
Copy Cat 0 0 1 0 0 1 0 0
我想将电影类型合并为一个单一列。输出将是这样的:
movie genre
Toy Story action, adventure, fantasy, sci-fy
Golden Eye action, romance
Four Rooms unknown
Get Shorty animation, fantasy, romance
Copy Cat adventure, horror
答案 0 :(得分:2)
你可以这样做:
In [171]: df['genre'] = df.iloc[:, 1:].apply(lambda x: df.iloc[:, 1:].columns[x.astype(bool)].tolist(), axis=1)
In [172]: df
Out[172]:
movie unknown action adventure animation fantasy horror romance sci-fi genre
0 Toy Story 0 1 1 0 1 0 0 1 [action, adventure, fantasy, sci-fi]
1 Golden Eye 0 1 0 0 0 0 1 0 [action, romance]
2 Four Rooms 1 0 0 0 0 0 0 0 [unknown]
3 Get Shorty 0 0 0 1 1 0 1 0 [animation, fantasy, romance]
4 Copy Cat 0 0 1 0 0 1 0 0 [adventure, horror]
PS,但我不明白它对你有什么帮助,与“一个热编码”矩阵相比,我没有看到任何好处