如何部分转置熊猫数据框

时间:2019-10-22 13:24:58

标签: python pandas dataframe transpose

我有一个与此here类似的问题。我想部分转置熊猫数据框。我得到了一个类似于以下内容的数据框:

data = [{"Student" : "john", "Subject" : 'Math', 'Plan_Actual_Delta' : 'Plan' , "2009" : 100, "2010" : 100},
            {"Student" : "john", "Subject" : 'Math', 'Plan_Actual_Delta' : 'Actual' ,"2009" : 80, "2010" : 100}, 
            {"Student" : "john", "Subject" : 'Math' , 'Plan_Actual_Delta' : 'Delta' ,"2009" : -20, "2010" : 0},
            {"Student" : "lisa", "Subject" : 'Math', 'Plan_Actual_Delta' : 'Plan' ,"2009" : 80, "2010" : 100},
            {"Student" : "lisa", "Subject" : 'Math', 'Plan_Actual_Delta' : 'Actual' ,"2009" : 75, "2010" : 100},
            {"Student" : "lisa", "Subject" : 'Math', 'Plan_Actual_Delta' : 'Delta' ,"2009" : -5, "2010" : 0}]

df = pd.DataFrame(data)

它显示了学生及其在给定年份中给定主题的计划和实际表现(以及差异)。在此示例中,年份是列。在行中给出行显示计划的,实际的还是学生表现的增量。

我想以计划,实际和增量成为列的方式对其进行转换。 因此,我的目标是建立以下结构:

data = [{"Student" : "john", "Subject" : 'Math', 'Year': '2009', 'Plan':100, 'Actual':80, 'Delta': -20},
       {"Student" : "john", "Subject" : 'Math', 'Year': '2010', 'Plan':100, 'Actual':100, 'Delta': 0},
        {"Student" : "lisa", "Subject" : 'Math', 'Year': '2009', 'Plan':80, 'Actual':75, 'Delta': -5},
       {"Student" : "lisa", "Subject" : 'Math', 'Year': '2010', 'Plan':100, 'Actual':100, 'Delta': 0}]

df = pd.DataFrame(data)

您将如何做?预先感谢/ R

1 个答案:

答案 0 :(得分:3)

DataFrame.set_indexDataFrame.stack的第三级重塑下使用Series.unstack

df = (df.set_index(['Student','Subject','Plan_Actual_Delta'])
        .rename_axis('Year', axis=1)
        .stack()
        .unstack(2)
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
  Student Subject  Year  Actual  Delta  Plan
0    john    Math  2009      80    -20   100
1    john    Math  2010     100      0   100
2    lisa    Math  2009      75     -5    80
3    lisa    Math  2010     100      0   100

另一种解决方案(如果不能首先与DataFrame.meltDataFrame.pivot_table进行可能的聚合一起使用):

df = (df.melt(['Student','Subject','Plan_Actual_Delta'], var_name='Year')
        .pivot_table(index=['Student','Subject','Year'], 
                     columns='Plan_Actual_Delta',
                     values='value',
                     aggfunc='mean')
        .reset_index()
        .rename_axis(None, axis=1)
)
print (df)
  Student Subject  Year  Actual  Delta  Plan
0    john    Math  2009      80    -20   100
1    john    Math  2010     100      0   100
2    lisa    Math  2009      75     -5    80
3    lisa    Math  2010     100      0   100