想要将user_Id和技能dataFrame矩阵转换为零一个DataFrame矩阵格式用户及其相应的技能
输入数据框
user_Id skills
0 user1 "java, hdfs, hadoop"
1 user2 "python, c++, c"
2 user3 "hadoop, java, hdfs"
3 user4 "html, java, php"
4 user5 "hadoop, php, hdfs"
所需的输出数据框
user_Id java c c++ hadoop hdfs python html php
user1 1 0 0 1 1 0 0 0
user2 0 1 1 0 0 1 0 0
user3 1 0 0 1 1 0 0 0
user4 1 0 0 0 0 0 1 1
user5 0 0 0 1 1 0 0 1
答案 0 :(得分:0)
对我来说,作品str.get_dummies
+ concat
:
df1 = df['skills'].str.get_dummies(', ')
print (df1)
c c++ hadoop hdfs html java php python
0 0 0 1 1 0 1 0 0
1 1 1 0 0 0 0 0 1
2 0 0 1 1 0 1 0 0
3 0 0 0 0 1 1 1 0
4 0 0 1 1 0 0 1 0
df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0
编辑:
如果space
,
没有使用:
df1 = df['skills'].str.get_dummies(',')