重新排列pandas中的Dataframe

时间:2017-03-17 05:33:36

标签: python python-2.7 pandas dataframe

  • 以下是我的数据框:
  • GS:General Shift
  • MS:Morning Shift
  • ES:晚班转换
  • 11,22,33是employee_id

数据帧:

         Day        Date  GS  MS  ES
0     Monday  20/02/2017  11  22  33
1    Tuesday  21/02/2017  22  11  33
2  Wednesday  22/02/2017  33  22  11

我想将该DataFrame转换为以下模式

         20/02/2017  21/02/2017  22/02/2017
          Monday       Tuesday    Wednesday
0     11    GS           MS         ES
1     22    ES           GS         MS
2     33    MS           ES         GS

所以,我正在尝试根据employee_id转换表格,我尝试了DataFrame.transpose(),但我无法让它做我需要的。

3 个答案:

答案 0 :(得分:1)

您可以melt使用unstack

df =  pd.melt(df, id_vars=['Day', 'Date']).set_index(['Date','Day', 'value']).unstack([0,1])
print (df)
Date  20/02/2017 21/02/2017 22/02/2017
Day       Monday    Tuesday  Wednesday
value                                 
11            GS         MS         ES
22            MS         GS         MS
33            ES         ES         GS

stackunstack的另一种解决方案:

df = df.set_index(['Day','Date'])
       .stack()
       .reset_index(level=2, name='a')
       .set_index('a', append=True)
       .unstack([1,0])
print (df)
Date 20/02/2017 21/02/2017 22/02/2017
Day      Monday    Tuesday  Wednesday
a                                    
11           GS         MS         ES
22           MS         GS         MS
33           ES         ES         GS

但如果得到:

  

ValueError:索引包含重复的条目,无法重塑

print (df)
         Day        Date  GS  MS  ES
0     Monday  20/02/2017  11  22  33
1    Tuesday  21/02/2017  22  11  11 < 33 changed to 11
2  Wednesday  22/02/2017  33  22  11

meltgroupby的解决方案,汇总join

df =  pd.melt(df, id_vars=['Day', 'Date'])
        .groupby(['Date','Day', 'value'])['variable']
        .apply(','.join)
        .unstack([0,1])
print (df)
Date  20/02/2017 21/02/2017 22/02/2017
Day       Monday    Tuesday  Wednesday
value                                 
11            GS      MS,ES         ES
22            MS         GS         MS
33            ES       None         GS

meltpivot_table的解决方案:

#aggregate by first value can be dangerous - lost data
df1 =  pd.melt(df, id_vars=['Day', 'Date']).pivot_table(index='value',
                           columns=['Day', 'Date'], values='variable', aggfunc='first')
print (df1)
Day       Monday    Tuesday  Wednesday
Date  20/02/2017 21/02/2017 22/02/2017
value                                 
11            GS         MS         ES
22            MS         GS         MS
33            ES       None         GS

#better aggreagate by sum or join, data are not lost
df1 =  pd.melt(df, id_vars=['Day', 'Date']).pivot_table(index='value',
                           columns=['Day', 'Date'], values='variable', aggfunc=','.join)
print (df1)
Day       Monday    Tuesday  Wednesday
Date  20/02/2017 21/02/2017 22/02/2017
value                                 
11            GS      MS,ES         ES
22            MS         GS         MS
33            ES       None         GS

df1 =  pd.melt(df, id_vars=['Day', 'Date']).pivot_table(index='value',
                           columns=['Day', 'Date'], values='variable', aggfunc='sum')
print (df1)
Day       Monday    Tuesday  Wednesday
Date  20/02/2017 21/02/2017 22/02/2017
value                                 
11            GS       MSES         ES
22            MS         GS         MS
33            ES       None         GS

答案 1 :(得分:1)

如果你融化然后转动你可以得到:

<强>代码:

def by_employee(frame):
    melted = pd.melt(
        frame, value_vars=['GS', 'MS', 'ES'], id_vars=['Day', 'Date'])
    pivot = pd.pivot_table(melted, values='variable', index='value',
                           columns=['Day', 'Date'],
                           aggfunc=lambda x: x.values[0])
    return pivot

测试代码:

data = [x.strip().split() for x in """
            Day        Date  GS  MS  ES
         Monday  20/02/2017  11  22  33
        Tuesday  21/02/2017  22  11  33
      Wednesday  22/02/2017  33  22  11
""".split('\n')[1:-1]]
df = pd.DataFrame(data[1:], columns=data[0])
print(df)

print(by_employee(df))

<强>结果:

         Day        Date  GS  MS  ES
0     Monday  20/02/2017  11  22  33
1    Tuesday  21/02/2017  22  11  33
2  Wednesday  22/02/2017  33  22  11

Day       Monday    Tuesday  Wednesday
Date  20/02/2017 21/02/2017 22/02/2017
value                                 
11            GS         MS         ES
22            MS         GS         MS
33            ES         ES         GS

答案 2 :(得分:0)

您可以迭代数据框并根据当前的数据创建新的数据框,只有我们将两列连接在一起:

data = []
for row in df.itertuples():
    data.append([
        # row[2]: Day, row[1]: Date
        '{} {}'.format(row[2],row[1]), 
        *row[3:],
    ])

new_df = pd.DataFrame(data)