如何分组pandas DF条目和进度列值?

时间:2016-01-18 14:57:32

标签: python pandas

我的Dataframe按客户,年和月分组:

my_list = ['Customer','Year','Month']
g = df.groupby(my_list)['COST'].sum()

Customer    Year  Month COST    
1000061     2013  12     122.77
            2014  1      450.40
                  2      249.61
                  3      533.58
                  4      337.32
                  5      482.49
1000063     2013  12     875.67
            2014  1      376.95
                  2      308.90
                  3      469.76
                  4      394.34

但是,现在我想添加2个新列(进度列COST,一个或两个): - 1.下个月的预计费用 - 2.第2个月的预期费用

Customer    Year  Month COST    COST_NextMonth  COST_2Months
1000061    2013  12     122.77  450.40      249.61
           2014  1      450.40  249.61      533.58
                 2      249.61  533.58      337.32
                 3      533.58  337.32      482.49
                 4      337.32  482.49      0
                 5      482.49  0           0
1000063    2013  12     875.67  376.95      308.9
           2014  1      376.95  308.9       469.76
                 2      308.90  469.76      394.34
                 3      469.76  394.34      0
                 4      394.34  0           0

我如何实现这一目标?

1 个答案:

答案 0 :(得分:1)

IIUC您可以将concatshiftfillna一起使用:

print pd.concat([g, 
                 g.groupby(level=0).shift(-1).fillna(0), 
                 g.groupby(level=0).shift(-2).fillna(0)], axis=1,              
                keys=['COST','COST_NextMonth','COST_2Months'])

                       COST  COST_NextMonth  COST_2Months
Customer Year Month                                      
1000061  2013 12     122.77          450.40        249.61
         2014 1      450.40          249.61        533.58
              2      249.61          533.58        337.32
              3      533.58          337.32        482.49
              4      337.32          482.49          0.00
              5      482.49            0.00          0.00
1000063  2013 12     875.67          376.95        308.90
         2014 1      376.95          308.90        469.76
              2      308.90          469.76        394.34
              3      469.76          394.34          0.00
              4      394.34            0.00          0.00

使用reset_index的下一个解决方案:

df['COST_NextMonth'] = g.reset_index().groupby('Customer')['COST'].shift(-1).fillna(0)
df['COST_2Months'] =   g.reset_index().groupby('Customer')['COST'].shift(-2).fillna(0)
print df

    Customer  Year  Month    COST  COST_NextMonth  COST_2Months
0    1000061  2013     12  122.77          450.40        249.61
1    1000061  2014      1  450.40          249.61        533.58
2    1000061  2014      2  249.61          533.58        337.32
3    1000061  2014      3  533.58          337.32        482.49
4    1000061  2014      4  337.32          482.49          0.00
5    1000061  2014      5  482.49            0.00          0.00
6    1000063  2013     12  875.67          376.95        308.90
7    1000063  2014      1  376.95          308.90        469.76
8    1000063  2014      2  308.90          469.76        394.34
9    1000063  2014      3  469.76          394.34          0.00
10   1000063  2014      4  394.34            0.00          0.00