index 0 1 2 3 4 5 \
0 0 Action Adventure Fantasy Sci-Fi NaN NaN
1 1 Action Adventure Fantasy NaN NaN NaN
2 2 Action Adventure Thriller NaN NaN NaN
3 3 Action Thriller NaN NaN NaN NaN
4 4 Documentary NaN NaN NaN NaN NaN
5 5 Action Adventure Sci-Fi NaN NaN NaN
6 6 Action Adventure Romance NaN NaN NaN
7 7 Adventure Animation Comedy Family Fantasy Musical
8 8 Action Adventure Sci-Fi NaN NaN NaN
9 9 Adventure Family Fantasy Mystery NaN NaN
我有这样的数据......
但我不知道在pandas python中制作数据帧数据大小不同的假人....
Action Adventure Fantasy Sci-Fi Thriller Ducumentary Romance Animation Comedy family Fantasy Musical Mystery
0 1 1 1 1 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 1 0 0 0 0 0 0 0 0
3 1 0 0 0 1 0 0 0 0 0 0 0 0
4 0 0 0 0 0 1 0 0 0 0 0 0 0
5 1 1 0 1 0 0 0 0 0 0 0 0 0
6 1 1 0 0 0 0 1 0 0 0 0 0 0
7 0 1 0 0 0 0 0 1 1 1 1 1 0
8 1 1 0 1 0 0 0 0 0 0 0 0 0
9 0 1 0 0 0 0 0 0 0 1 1 0 1
像这样....
答案 0 :(得分:3)
我认为您可以使用get_dummies
,但首先需要按drop
或iloc
删除第一列,然后按stack
创建Series
。
输出有重复索引,因此groupby
需要index
并汇总max
:
df = pd.get_dummies(df.drop('index', axis=1).stack()).groupby(level=0).max()
#alternative solution
#df = pd.get_dummies(df.iloc[:, 1:].stack()).groupby(level=0).max()
print (df)
Action Adventure Animation Comedy Documentary Family Fantasy \
0 1 1 0 0 0 0 1
1 1 1 0 0 0 0 1
2 1 1 0 0 0 0 0
3 1 0 0 0 0 0 0
4 0 0 0 0 1 0 0
5 1 1 0 0 0 0 0
6 1 1 0 0 0 0 0
7 0 1 1 1 0 1 1
8 1 1 0 0 0 0 0
9 0 1 0 0 0 1 1
Musical Mystery Romance Sci-Fi Thriller
0 0 0 0 1 0
1 0 0 0 0 0
2 0 0 0 0 1
3 0 0 0 0 1
4 0 0 0 0 0
5 0 0 0 1 0
6 0 0 1 0 0
7 1 0 0 0 0
8 0 0 0 1 0
9 0 1 0 0 0
答案 1 :(得分:1)