我的Dataframe按客户,年和月分组:
my_list = ['Customer','Year','Month']
g = df.groupby(my_list)['COST'].sum()
Customer Year Month COST
1000061 2013 12 122.77
2014 1 450.40
2 249.61
3 533.58
4 337.32
5 482.49
1000063 2013 12 875.67
2014 1 376.95
2 308.90
3 469.76
4 394.34
但是,现在我想添加2个新列(进度列COST,一个或两个): - 1.下个月的预计费用 - 2.第2个月的预期费用
Customer Year Month COST COST_NextMonth COST_2Months
1000061 2013 12 122.77 450.40 249.61
2014 1 450.40 249.61 533.58
2 249.61 533.58 337.32
3 533.58 337.32 482.49
4 337.32 482.49 0
5 482.49 0 0
1000063 2013 12 875.67 376.95 308.9
2014 1 376.95 308.9 469.76
2 308.90 469.76 394.34
3 469.76 394.34 0
4 394.34 0 0
我如何实现这一目标?
答案 0 :(得分:1)
IIUC您可以将concat
与shift
和fillna
一起使用:
print pd.concat([g,
g.groupby(level=0).shift(-1).fillna(0),
g.groupby(level=0).shift(-2).fillna(0)], axis=1,
keys=['COST','COST_NextMonth','COST_2Months'])
COST COST_NextMonth COST_2Months
Customer Year Month
1000061 2013 12 122.77 450.40 249.61
2014 1 450.40 249.61 533.58
2 249.61 533.58 337.32
3 533.58 337.32 482.49
4 337.32 482.49 0.00
5 482.49 0.00 0.00
1000063 2013 12 875.67 376.95 308.90
2014 1 376.95 308.90 469.76
2 308.90 469.76 394.34
3 469.76 394.34 0.00
4 394.34 0.00 0.00
使用reset_index
的下一个解决方案:
df['COST_NextMonth'] = g.reset_index().groupby('Customer')['COST'].shift(-1).fillna(0)
df['COST_2Months'] = g.reset_index().groupby('Customer')['COST'].shift(-2).fillna(0)
print df
Customer Year Month COST COST_NextMonth COST_2Months
0 1000061 2013 12 122.77 450.40 249.61
1 1000061 2014 1 450.40 249.61 533.58
2 1000061 2014 2 249.61 533.58 337.32
3 1000061 2014 3 533.58 337.32 482.49
4 1000061 2014 4 337.32 482.49 0.00
5 1000061 2014 5 482.49 0.00 0.00
6 1000063 2013 12 875.67 376.95 308.90
7 1000063 2014 1 376.95 308.90 469.76
8 1000063 2014 2 308.90 469.76 394.34
9 1000063 2014 3 469.76 394.34 0.00
10 1000063 2014 4 394.34 0.00 0.00