我有一个数据框如下:
In [14]: grouped_data
Out[14]:
monthyear Facility Date Yield
0 Dec 15 CCM1 2015-12-01 2550.000000
1 Feb 16 CCM1 2016-02-01 4250.000000
2 Jan 16 CCM2 2016-01-01 1540.000000
3 Jan 16 CCM3 2016-01-01 6800.000000
4 Nov 15 CCM1 2015-11-01 921.458157
5 Nov 15 CCM2 2015-11-01 1750.310038
6 Sep 15 CCM3 2015-09-01 5191.197065
现在我需要数据框看起来像这样:
monthyear CCM1 CCM2 CCM3 Date
0 Dec 15 2550.000000 0 0 2015-12-01
1 Feb 16 4250.000000 0 0 2016-02-01
2 Jan 16 0 1540.000000 6800.000000 2016-01-01
3 Nov 15 921.458157 1750.310038 0 2015-11-01
4 Sep 15 0 5191.197065 0 2015-09-01
我将如何与熊猫一起做这件事。请帮忙。提前谢谢。
答案 0 :(得分:1)
使用pivot_table
:
print (df.pivot_table(index=['monthyear','Date'],
columns='Facility',
values='Yield',
fill_value=0))
Facility CCM1 CCM2 CCM3
monthyear Date
Dec 15 2015-12-01 2550.000000 0.000000 0.000000
Feb 16 2016-02-01 4250.000000 0.000000 0.000000
Jan 16 2016-01-01 0.000000 1540.000000 6800.000000
Nov 15 2015-11-01 921.458157 1750.310038 0.000000
Sep 15 2015-09-01 0.000000 0.000000 5191.197065
如果您需要reset_index
并删除列名,请使用rename_axis
(pandas
0.18.0
中的新内容):
print (df.pivot_table(index=['monthyear','Date'],
columns='Facility',
values='Yield',
fill_value=0).reset_index().rename_axis(None, axis=1))
monthyear Date CCM1 CCM2 CCM3
0 Dec 15 2015-12-01 2550.000000 0.000000 0.000000
1 Feb 16 2016-02-01 4250.000000 0.000000 0.000000
2 Jan 16 2016-01-01 0.000000 1540.000000 6800.000000
3 Nov 15 2015-11-01 921.458157 1750.310038 0.000000
4 Sep 15 2015-09-01 0.000000 0.000000 5191.197065
但pivot_table
使用aggfunc
,如果aggfunc=np.mean
列montyear
重复,则默认为Date
。样本的更好解释是here和docs。