我有一个这样的数据框:
id Date Volume Price Values(Volume*Price)
56033738624803469 20170111 1 943339 943339
56033738624803469 20170111 10 919410 9194100
56033738624803469 20170112 1 919410 919410
56033738624803469 20170112 5 954999 4774955
4659957480182399 20170207 1 1000000 1000000
4659957480182399 20170208 5 1000000 5000000
4659957480182399 20170208 40 1000000 40000000
我想为每个ID计算并绘制前100天的以下计算:
(https://i.stack.imgur.com/2cozR.png)
这是我到目前为止所做的:
df2 = df.groupby(['Id', 'Date']).sum()
结果是:
Index Volume Price Values
Id Date
1745829084228393 20170207 1 1000 1000000.0 1.000000e+09
20170208 5151 999000 101000000.0 9.990000e+11
20170403 1 12 1000100.0 1.200120e+07
20170408 1 12 1000000.0 1.200000e+07
20170417 1 500 1000000.0 5.000000e+08
20170423 3 14500 2000000.0 1.450000e+10
20170507 10 35000 4000000.0 3.500000e+10
20170510 21 49051 6000000.0 4.905100e+10
20170529 1 4 1000000.0 4.000000e+06
2888358730233310 20170212 820 2000000 40000000.0 2.000000e+12
2929948497881810 20170207 1830 1500000 60000000.0 1.500000e+12
20170208 903 700000 42000000.0 7.000000e+11
20170212 1176 800000 48000000.0 8.000000e+11
3715246194918044 20150509 66 1008 11000000.0 1.008000e+09
现在我想计算每个ID的第一个,第二个......的平均值,例如:
Date_Order avg_Sum_Values(= summation first date of each id /(number of ids))
first_Date 875.5 e+9
second_Date 849.5 e+9
答案 0 :(得分:0)
我确定有更简洁的方法可以做到这一点,但如果你能做好一些合并,你可以一步一步地做到这一点:
In [1]: df
Out[1]:
Id Date Value
0 1 2017-04-08 1
1 1 2017-04-08 1
2 1 2017-04-09 2
3 2 2017-04-08 3
4 2 2017-04-09 6
5 2 2017-04-09 4
6 3 2017-04-09 10
7 3 2017-04-09 11
8 3 2017-04-11 12
In [2]: min_dates = df.groupby('Id', as_index=False).Date.min()
...: df = pd.merge(df, min_dates, on='Id', suffixes=('', '_min'))
...: df['Date'] = ((df['Date'] - df['Date_min']) / pd.Timedelta('1 day')).apply(int)
...: df.groupby('Date').Value.sum()
...:
Out[2]:
Date
0 26
1 12
2 12
Name: Value, dtype: int64