将此Word DataFrame转换为零一个矩阵格式DataDrame在Python中Pandas消除了“”

时间:2017-05-30 11:22:13

标签: python-2.7 pandas dataframe sklearn-pandas

想要将user_Id和技能dataFrame矩阵转换为零一个DataFrame矩阵格式用户及其相应的技能

输入数据框

     user_Id                        skills

 0     user1               "java, hdfs, hadoop"
 1     user2               "python, c++, c"
 2     user3               "hadoop, java, hdfs"
 3     user4               "html, java, php"
 4     user5               "hadoop, php, hdfs"

所需的输出数据框

 user_Id       java  c   c++     hadoop  hdfs    python  html    php     

 user1         1     0   0       1       1       0       0       0
 user2         0     1   1       0       0       1       0       0
 user3        1     0   0       1       1       0       0       0
 user4         1     0   0       0       0       0       1       1
 user5         0     0   0       1       1       0       0       1

1 个答案:

答案 0 :(得分:0)

对我来说,作品str.get_dummies + concat

df1 = df['skills'].str.get_dummies(', ')
print (df1)
   c  c++  hadoop  hdfs  html  java  php  python
0  0    0       1     1     0     1    0       0
1  1    1       0     0     0     0    0       1
2  0    0       1     1     0     1    0       0
3  0    0       0     0     1     1    1       0
4  0    0       1     1     0     0    1       0

df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
  user_Id  c  c++  hadoop  hdfs  html  java  php  python
0   user1  0    0       1     1     0     1    0       0
1   user2  1    1       0     0     0     0    0       1
2   user3  0    0       1     1     0     1    0       0
3   user4  0    0       0     0     1     1    1       0
4   user5  0    0       1     1     0     0    1       0

编辑:

如果space ,没有使用:

df1 = df['skills'].str.get_dummies(',')