将此Word DataFrame转换为Python Pandas中的Zero One矩阵格式DataFrame

时间:2017-05-26 05:34:26

标签: python-2.7 pandas dataframe sklearn-pandas

想要将user_Id和技能dataFrame矩阵转换为零一个DataFrame矩阵格式用户及其相应的技能

输入数据框

     user_Id                        skills

0     user1               [java, hdfs, hadoop]
1     user2               [python, c++, c]
2     user3               [hadoop, java, hdfs]
3     user4               [html, java, php]
4     user5               [hadoop, php, hdfs]

所需的输出数据框

user_Id       java  c   c++     hadoop  hdfs    python  html    php     

user1         1     0   0       1       1       0       0       0
user2         0     1   1       0       0       1       0       0
 user3        1     0   0       1       1       0       0       0
user4         1     0   0       0       0       0       1       1
user5         0     0   0       1       1       0       0       1

1 个答案:

答案 0 :(得分:1)

join如果需要将DataFrame转换为lists(否则省略),则astype可以strip str创建新[],然后移除df = df[['user_Id']].join(df['skills'].astype(str).str.strip('[]').str.get_dummies(', ')) print (df) user_Id c c++ hadoop hdfs html java php python 0 user1 0 0 1 1 0 1 0 0 1 user2 1 1 0 0 0 0 0 1 2 user3 0 0 1 1 0 1 0 0 3 user4 0 0 0 0 1 1 1 0 4 user5 0 0 1 1 0 0 1 0 get_dummies并使用this

df1 = df['skills'].astype(str).str.strip('[]').str.get_dummies(', ')
#if necessary remove ' from columns names
df1.columns = df1.columns.str.strip("'")
df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
  user_Id  c  c++  hadoop  hdfs  html  java  php  python
0   user1  0    0       1     1     0     1    0       0
1   user2  1    1       0     0     0     0    0       1
2   user3  0    0       1     1     0     1    0       0
3   user4  0    0       0     0     1     1    1       0
4   user5  0    0       1     1     0     0    1       0
DownloadFileAsync myTask = new DownloadFileAsync();
AsyncTaskCompat.executeParallel(myTask);