大熊猫转动一个专栏并填写另一个透视列

时间:2017-08-07 15:05:23

标签: python python-3.x pandas dataframe pivot

我有一个df:

id    step    step_description    stepA    stepA_description    date
1     1       Start               1        Beginning            8/6/2017
1     2       Continue            2        Middle               8/7/2017
1     3       Finish              3        End                  8/7/2017

我想转移这些数据,看起来像这样:

id    step1  step2    step3   stepA1    stepA2  stepA3  step1_date  step2_date  step3_date
1     Start  Continue Finish  Beginning Middle  End     8/6/2017    8/7/2017    8/7/2017

这意味着必须旋转步骤列,然后用另一个旋转列填充。有没有办法用熊猫来实现这一目标?我已经阅读了数据库选项的文档(并且过去使用了几个选项来处理更直接的案例),但我无法破译实现此目的的方法。

我可以使用数据透视表来获取具有我想要的结构的多级索引。有没有办法基本上“删除”索引,以便层次结构的底部成为df的值?

感谢您对所有人的见解!

2 个答案:

答案 0 :(得分:2)

您可以使用2个解决方案 - 使用pivotunstack

df1 = df.pivot(index='id', columns='step', values='step_description').add_prefix('step')
print (df1)
step  step1     step2   step3
id                           
1     Start  Continue  Finish
df1 = df.set_index(['id', 'step'])['step_description'].unstack().add_prefix('step')
print (df1)
step  step1     step2   step3
id                           
1     Start  Continue  Finish

但如果重复,需要pivot_table或与groupbyapply合并:

print (df)
  id  step step_description
0   1     1            Start<-Same id=1, step=1
1   1     1           Start1<-Same id=1, step=1
2   1     2         Continue
3   1     3           Finish

df2=df.pivot_table(index='id', 
                   columns='step', 
                   values='step_description',
                   aggfunc=', '.join).add_prefix('step')
print (df2)
step          step1     step2   step3
id                                   
1     Start, Start1  Continue  Finish
df2=df.groupby(['id', 'step'])['step_description'].apply(','.join)
                                                  .unstack().add_prefix('step')
print (df2)
step         step1     step2   step3
id                                  
1     Start,Start1  Continue  Finish

编辑:

您需要2 DataFrame s然后concat

cols = ['id','step','step_description','date']
df1 = df[cols].set_index(['id', 'step']).unstack().rename(columns={'step_description':'des'})
df1.columns = ['step{}_{}'.format(x[1], x[0]) for x in df1.columns]
print (df1)
   step1_des step2_des step3_des step1_date step2_date step3_date
id                                                               
1      Start  Continue    Finish   8/6/2017   8/7/2017   8/7/2017

df2 = df.set_index(['id', 'stepA'])['stepA_description'].unstack().add_prefix('stepA')
print (df2)
stepA     stepA1  stepA2 stepA3
id                             
1      Beginning  Middle    End

df = pd.concat([df1, df2], axis=1).reset_index()
print (df)
   id step1_des step2_des step3_des step1_date step2_date step3_date  \
0   1     Start  Continue    Finish   8/6/2017   8/7/2017   8/7/2017   

      stepA1  stepA2 stepA3  
0  Beginning  Middle    End  

答案 1 :(得分:0)

pivotset_index方法外,您还可以使用groupby

df.groupby(['id', 'step'])['step_description'].sum().unstack().add_prefix('step'