我读过stackoverflow解决方案来解决此问题,但是没有人指定何时有多于一列要分开。例如:
输出
movieId genres
1 Adventure|Animation|Children|Comedy|Fantasy
2 Adventure|Children|Fantasy
3 Comedy|Romance
4 Comedy|Drama|Romance
5 Comedy
6 Action|Crime|Thriller
7 Comedy|Romance
我该怎么用熊猫呢?
答案 0 :(得分:4)
将dot
与带有|
的列名一起使用,并用rstrip
删除最后一个|
:
print (df1)
movieId Action Adventure Animation Children Comedy Crime Drama \
0 1 0 1 1 1 1 0 0
1 2 0 1 0 1 0 0 0
2 3 0 0 0 0 1 0 0
3 4 0 0 0 0 1 0 1
4 5 0 0 0 0 1 0 0
5 6 1 0 0 0 0 1 0
6 7 0 0 0 0 1 0 0
Fantasy Romance Thriller
0 1 0 0
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 0
5 0 0 1
6 0 1 0
df = df1.set_index('movieId')
df2 = df.dot(df.columns + '|').str.rstrip('|').reset_index(name='genres')
print (df2)
movieId genres
0 1 Adventure|Animation|Children|Comedy|Fantasy
1 2 Adventure|Children|Fantasy
2 3 Comedy|Romance
3 4 Comedy|Drama|Romance
4 5 Comedy
5 6 Action|Crime|Thriller
6 7 Comedy|Romance