我想从此pandas数据帧中拆分数据(我们称其为df1):
YEAR CODE DIFF
2013 XXXX 5.50
2013 YYYY 8.50
2013 ZZZZ 6.50
2014 XXXX 4.50
2014 YYYY 2.50
2014 ZZZZ 3.50
这样,我创建了一个新的数据框(称为df2),如下所示:
YEAR XXXX_DIFF ZZZZ_DIFF
2013 5.50 6.50
2014 4.50 3.50
我想我要按年份分组并将DIFF中的单列结果分成特定的CODE匹配项。我已经尝试过这样的事情:
df2 = df1[['YEAR','CODE','DIFF']].query('CODE == "XXXX"')
我知道我可以重命名列并删除多余的列,但是我不确定如何将ZZZZ DIFF值传递到df2吗?
答案 0 :(得分:3)
使用 pivot
+ filter
+ add_suffix
:
Pitcher
out = (df.pivot(*df).filter(['XXXX','ZZZZ']).add_suffix('_DIFF')
.reset_index().rename_axis(None,axis=1))
答案 1 :(得分:2)
您可以先设置索引并取消堆栈,最后删除不需要的列级别并重命名。
(
df1.loc[df.CODE!='YYYY']
.set_index(['YEAR', 'CODE'])
.unstack()
.pipe(lambda x: x.set_axis(x.columns.droplevel(0)+'_DIFF',
axis=1, inplace=False))
)
CODE XXXX_DIFF ZZZZ_DIFF
YEAR
2013 5.5 6.5
2014 4.5 3.5
答案 2 :(得分:1)
IIUC,
df = (df
#I use the first method because groupby automatically sorts
#the largest will be at the top
#first method gets the first row for each group
.groupby(['YEAR','CODE'],as_index=False)['DIFF'].first()
.query('CODE.isin(["XXXX","ZZZZ"])')
.pivot(index='YEAR', columns = 'CODE'))
#this lumps the multiindex columns into one
#the reversed method flips the positions to match ur expected output
df.columns = ['_'.join(reversed(i)) for i in df.columns.to_flat_index()]
df.reset_index()
YEAR XXXX_DIFF ZZZZ_DIFF
0 2013 5.5 6.5
1 2014 4.5 3.5