如何按照大熊猫中看到的每一天(第一天,第二天等)的顺序对值进行求和

时间:2017-10-16 13:56:18

标签: python pandas datetime

我有一个这样的数据框:

     id                 Date       Volume      Price     Values(Volume*Price)
    56033738624803469  20170111    1          943339    943339
    56033738624803469  20170111    10         919410    9194100
    56033738624803469  20170112    1          919410    919410
    56033738624803469  20170112    5          954999    4774955
    4659957480182399   20170207    1          1000000   1000000
    4659957480182399   20170208    5          1000000   5000000
    4659957480182399   20170208    40         1000000   40000000

我想为每个ID计算并绘制前100天的以下计算:

  1. 计算前100天每天的平均值
  2. 为所有Ids绘制此图
  3. 之后,情节应该是这样的:
  4. Desired plothttps://i.stack.imgur.com/2cozR.png

    这是我到目前为止所做的:

    df2 = df.groupby(['Id', 'Date']).sum()
    

    结果是:

                                   Index   Volume        Price        Values
    Id               Date                                               
    1745829084228393 20170207      1     1000    1000000.0  1.000000e+09
                     20170208   5151   999000  101000000.0  9.990000e+11 
                     20170403      1       12    1000100.0  1.200120e+07
                     20170408      1       12    1000000.0  1.200000e+07
                     20170417      1      500    1000000.0  5.000000e+08
                     20170423      3    14500    2000000.0  1.450000e+10
                     20170507     10    35000    4000000.0  3.500000e+10
                     20170510     21    49051    6000000.0  4.905100e+10
                     20170529      1        4    1000000.0  4.000000e+06
    2888358730233310 20170212    820  2000000   40000000.0  2.000000e+12
    2929948497881810 20170207   1830  1500000   60000000.0  1.500000e+12
                     20170208    903   700000   42000000.0  7.000000e+11
                     20170212   1176   800000   48000000.0  8.000000e+11
    3715246194918044 20150509     66     1008   11000000.0  1.008000e+09
    

    现在我想计算每个ID的第一个,第二个......的平均值,例如:

     Date_Order     avg_Sum_Values(= summation first date of each id /(number of ids))
     first_Date     875.5 e+9 
     second_Date    849.5 e+9
    

1 个答案:

答案 0 :(得分:0)

我确定有更简洁的方法可以做到这一点,但如果你能做好一些合并,你可以一步一步地做到这一点:

In [1]: df
Out[1]:
   Id       Date  Value
0   1 2017-04-08      1
1   1 2017-04-08      1
2   1 2017-04-09      2
3   2 2017-04-08      3
4   2 2017-04-09      6
5   2 2017-04-09      4
6   3 2017-04-09     10
7   3 2017-04-09     11
8   3 2017-04-11     12

In [2]: min_dates = df.groupby('Id', as_index=False).Date.min()
   ...: df = pd.merge(df, min_dates, on='Id', suffixes=('', '_min'))
   ...: df['Date'] = ((df['Date'] - df['Date_min']) / pd.Timedelta('1 day')).apply(int)
   ...: df.groupby('Date').Value.sum()
   ...:
Out[2]:
Date
0    26
1    12
2    12
Name: Value, dtype: int64