想要将user_Id和技能dataFrame矩阵转换为零一个DataFrame矩阵格式用户及其相应的技能
输入数据框
user_Id skills
0 user1 [java, hdfs, hadoop]
1 user2 [python, c++, c]
2 user3 [hadoop, java, hdfs]
3 user4 [html, java, php]
4 user5 [hadoop, php, hdfs]
所需的输出数据框
user_Id java c c++ hadoop hdfs python html php
user1 1 0 0 1 1 0 0 0
user2 0 1 1 0 0 1 0 0
user3 1 0 0 1 1 0 0 0
user4 1 0 0 0 0 0 1 1
user5 0 0 0 1 1 0 0 1
答案 0 :(得分:1)
join
如果需要将DataFrame
转换为lists
(否则省略),则astype
可以strip
str
创建新[]
,然后移除df = df[['user_Id']].join(df['skills'].astype(str).str.strip('[]').str.get_dummies(', '))
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0
get_dummies
并使用this:
df1 = df['skills'].astype(str).str.strip('[]').str.get_dummies(', ')
#if necessary remove ' from columns names
df1.columns = df1.columns.str.strip("'")
df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0
DownloadFileAsync myTask = new DownloadFileAsync();
AsyncTaskCompat.executeParallel(myTask);