我有一个与此类似的数据框:
questions = ['What color?', 'What day?', 'How cold?', 'What color?', 'What color?']
answers = ['red', 'tuesday', '45', 'blue', 'red']
ids = [0, 1, 2, 3, 0]
df = pd.DataFrame({'id': [0, 1, 2, 0, 0], 'questions': questions, 'answers': answers})
>>> id questions answers
0 What color? red
1 What day? tuesday
2 How cold? 45
0 What color? blue
0 What color? red
我想要这个:
How cold? What color? What day?
id
0 None red None
2 None None tuesday
3 45 None None
4 None blue None
0 None red None
我试过了:
df.pivot(values='answers', index='id', columns='questions')
但是,由于索引中的重复项,pivot始终会导致错误。
答案 0 :(得分:5)
您可以使用pivot方法实现这一目标:
df.pivot(columns="questions",values="answers")
输出
How cold? What color? What day?
0 NaN red NaN
1 NaN NaN tuesday
2 45 NaN NaN
3 NaN blue NaN
4 NaN red NaN
编辑如果你想保留你拥有的索引,你可以这样做:
new_df = df.pivot(columns="questions",values="answers")
new_df.index = df.index
答案 1 :(得分:3)
如果需要重复:
df['g'] = df.groupby('id').cumcount()
df = df.set_index(['id','g', 'questions']).unstack().reset_index(level=1, drop=True)
print (df)
questions How cold? What color? What day?
id
0 None red None
0 None blue None
0 None red None
1 None None tuesday
2 45 None None