如何在线重构虚拟数据?

时间:2018-11-10 14:16:11

标签: python pandas

我读过stackoverflow解决方案来解决此问题,但是没有人指定何时有多于一列要分开。例如:

输入 enter image description here

输出

movieId genres
1       Adventure|Animation|Children|Comedy|Fantasy
2       Adventure|Children|Fantasy
3       Comedy|Romance
4       Comedy|Drama|Romance
5       Comedy
6       Action|Crime|Thriller
7       Comedy|Romance

我该怎么用熊猫呢?

1 个答案:

答案 0 :(得分:4)

dot与带有|的列名一起使用,并用rstrip删除最后一个|

print (df1)
   movieId  Action  Adventure  Animation  Children  Comedy  Crime  Drama  \
0        1       0          1          1         1       1      0      0   
1        2       0          1          0         1       0      0      0   
2        3       0          0          0         0       1      0      0   
3        4       0          0          0         0       1      0      1   
4        5       0          0          0         0       1      0      0   
5        6       1          0          0         0       0      1      0   
6        7       0          0          0         0       1      0      0   

   Fantasy  Romance  Thriller  
0        1        0         0  
1        1        0         0  
2        0        1         0  
3        0        1         0  
4        0        0         0  
5        0        0         1  
6        0        1         0  

df = df1.set_index('movieId')
df2 = df.dot(df.columns + '|').str.rstrip('|').reset_index(name='genres')

print (df2)
   movieId                                       genres
0        1  Adventure|Animation|Children|Comedy|Fantasy
1        2                   Adventure|Children|Fantasy
2        3                               Comedy|Romance
3        4                         Comedy|Drama|Romance
4        5                                       Comedy
5        6                        Action|Crime|Thriller
6        7                               Comedy|Romance