我有一个df:
id step step_description stepA stepA_description date
1 1 Start 1 Beginning 8/6/2017
1 2 Continue 2 Middle 8/7/2017
1 3 Finish 3 End 8/7/2017
我想转移这些数据,看起来像这样:
id step1 step2 step3 stepA1 stepA2 stepA3 step1_date step2_date step3_date
1 Start Continue Finish Beginning Middle End 8/6/2017 8/7/2017 8/7/2017
这意味着必须旋转步骤列,然后用另一个旋转列填充。有没有办法用熊猫来实现这一目标?我已经阅读了数据库选项的文档(并且过去使用了几个选项来处理更直接的案例),但我无法破译实现此目的的方法。
我可以使用数据透视表来获取具有我想要的结构的多级索引。有没有办法基本上“删除”索引,以便层次结构的底部成为df的值?
感谢您对所有人的见解!
答案 0 :(得分:2)
您可以使用2个解决方案 - 使用pivot
或unstack
:
df1 = df.pivot(index='id', columns='step', values='step_description').add_prefix('step')
print (df1)
step step1 step2 step3
id
1 Start Continue Finish
df1 = df.set_index(['id', 'step'])['step_description'].unstack().add_prefix('step')
print (df1)
step step1 step2 step3
id
1 Start Continue Finish
但如果重复,需要pivot_table
或与groupby
和apply
合并:
print (df)
id step step_description
0 1 1 Start<-Same id=1, step=1
1 1 1 Start1<-Same id=1, step=1
2 1 2 Continue
3 1 3 Finish
df2=df.pivot_table(index='id',
columns='step',
values='step_description',
aggfunc=', '.join).add_prefix('step')
print (df2)
step step1 step2 step3
id
1 Start, Start1 Continue Finish
df2=df.groupby(['id', 'step'])['step_description'].apply(','.join)
.unstack().add_prefix('step')
print (df2)
step step1 step2 step3
id
1 Start,Start1 Continue Finish
编辑:
您需要2 DataFrame
s然后concat
:
cols = ['id','step','step_description','date']
df1 = df[cols].set_index(['id', 'step']).unstack().rename(columns={'step_description':'des'})
df1.columns = ['step{}_{}'.format(x[1], x[0]) for x in df1.columns]
print (df1)
step1_des step2_des step3_des step1_date step2_date step3_date
id
1 Start Continue Finish 8/6/2017 8/7/2017 8/7/2017
df2 = df.set_index(['id', 'stepA'])['stepA_description'].unstack().add_prefix('stepA')
print (df2)
stepA stepA1 stepA2 stepA3
id
1 Beginning Middle End
df = pd.concat([df1, df2], axis=1).reset_index()
print (df)
id step1_des step2_des step3_des step1_date step2_date step3_date \
0 1 Start Continue Finish 8/6/2017 8/7/2017 8/7/2017
stepA1 stepA2 stepA3
0 Beginning Middle End
答案 1 :(得分:0)
除pivot
和set_index
方法外,您还可以使用groupby
df.groupby(['id', 'step'])['step_description'].sum().unstack().add_prefix('step'