如何将数据从熊猫数据框的一列拆分为新数据框的多列

时间:2020-01-25 04:11:53

标签: python python-3.x pandas

我想从此pandas数据帧中拆分数据(我们称其为df1):

YEAR   CODE   DIFF
2013   XXXX   5.50
2013   YYYY   8.50
2013   ZZZZ   6.50
2014   XXXX   4.50
2014   YYYY   2.50
2014   ZZZZ   3.50

这样,我创建了一个新的数据框(称为df2),如下所示:

YEAR   XXXX_DIFF   ZZZZ_DIFF
2013   5.50        6.50
2014   4.50        3.50

我想我要按年份分组并将DIFF中的单列结果分成特定的CODE匹配项。我已经尝试过这样的事情:

df2 = df1[['YEAR','CODE','DIFF']].query('CODE == "XXXX"')

我知道我可以重命名列并删除多余的列,但是我不确定如何将ZZZZ DIFF值传递到df2吗?

3 个答案:

答案 0 :(得分:3)

使用 pivot + filter + add_suffix

Pitcher

out = (df.pivot(*df).filter(['XXXX','ZZZZ']).add_suffix('_DIFF')
                   .reset_index().rename_axis(None,axis=1))

答案 1 :(得分:2)

您可以先设置索引并取消堆栈,最后删除不需要的列级别并重命名。

(
    df1.loc[df.CODE!='YYYY']
    .set_index(['YEAR', 'CODE'])
    .unstack()
    .pipe(lambda x: x.set_axis(x.columns.droplevel(0)+'_DIFF',
                               axis=1, inplace=False))
)


CODE    XXXX_DIFF   ZZZZ_DIFF
YEAR        
2013    5.5         6.5
2014    4.5         3.5

答案 2 :(得分:1)

IIUC,

 df = (df
       #I use the first method because groupby automatically sorts
       #the largest will be at the top
       #first method gets the first row for each group
       .groupby(['YEAR','CODE'],as_index=False)['DIFF'].first()
       .query('CODE.isin(["XXXX","ZZZZ"])')
       .pivot(index='YEAR', columns = 'CODE'))

 #this lumps the multiindex columns into one
 #the reversed method flips the positions to match ur expected output
 df.columns = ['_'.join(reversed(i)) for i in df.columns.to_flat_index()]

df.reset_index()

    YEAR    XXXX_DIFF   ZZZZ_DIFF
0   2013        5.5       6.5
1   2014        4.5       3.5