使用自定义函数计算pandas上的每日聚合

时间:2017-02-11 23:55:52

标签: python pandas aggregate

我是Pandas的新手,作为练习,我正在移动一些旧的代码/解决方案来学习它。在这种情况下,我正在尝试计算产品价格的综合指数,该指数在用SQL计算之前。

这是我在数据框中的数据:

    id  weight      date        price
0   1   0.002796    2005-11-15  0.998298
1   1   0.002796    2005-11-16  1.014242
2   1   0.002796    2005-11-17  1.016452
3   1   0.002796    2005-11-18  1.026396
4   1   0.002796    2005-11-19  1.026047
5   1   0.002796    2005-11-20  1.024285
6   1   0.002796    2005-11-21  1.018764
7   1   0.002796    2005-11-22  1.033175
8   1   0.002796    2005-11-23  1.058509
9   1   0.002796    2005-11-24  1.061231
10  1   0.002796    2005-11-25  1.058137
11  1   0.002796    2005-11-26  0.999380
12  1   0.002796    2005-11-27  0.990504
13  1   0.002796    2005-11-28  0.993764
14  1   0.002796    2005-11-29  0.978754
15  1   0.002796    2005-11-30  0.992070
...     ...     ...     ...     ...
4085    1   0.002796    2017-01-21  0.857420
4086    1   0.002796    2017-01-22  0.848195
4087    1   0.002796    2017-01-23  0.791784
4088    1   0.002796    2017-01-24  0.846603
4089    1   0.002796    2017-01-25  0.878104
4090    1   0.002796    2017-01-26  0.806651
4091    1   0.002796    2017-01-27  0.849316
4092    1   0.002796    2017-01-28  0.826550
4093    1   0.002796    2017-01-29  0.848651
4094    1   0.002796    2017-01-30  0.829643
4095    1   0.002796    2017-01-31  0.837094
4096    1   0.002796    2017-02-01  0.846572
4097    1   0.002796    2017-02-02  0.800163
4098    1   0.002796    2017-02-03  0.820356
4099    1   0.002796    2017-02-04  0.818924
4100    1   0.002796    2017-02-05  0.822157
4101    1   0.002796    2017-02-06  0.787123
4102    1   0.002796    2017-02-07  0.796264
4103    1   0.002796    2017-02-08  0.797241
4104    1   0.002796    2017-02-09  0.818499
4105    1   0.002796    2017-02-10  0.810928

综合指数以每日回报计算,是当天的回报:

Rt =(Price_day / Price_day_before) - 1

我一直在读大熊猫,时间序列等,但我很难理解在这里执行的具体操作;这是一个滚动?如何获取给定日期和日期之前的数据?

3 个答案:

答案 0 :(得分:3)

代码是:

df['return'] = df['price']/df['price'].shift(1) - 1

python中的df.shift(1)函数在SQL中用作LEAD()LEG()函数。

答案 1 :(得分:2)

您希望使用pct_change()方法的IIUC:

In [196]: x
Out[196]:
    id    weight        date     price
0    1  0.002796  2005-11-15  0.998298
1    1  0.002796  2005-11-16  1.014242
2    1  0.002796  2005-11-17  1.016452
3    1  0.002796  2005-11-18  1.026396
4    1  0.002796  2005-11-19  1.026047
5    1  0.002796  2005-11-20  1.024285
6    1  0.002796  2005-11-21  1.018764
7    1  0.002796  2005-11-22  1.033175
8    1  0.002796  2005-11-23  1.058509
9    1  0.002796  2005-11-24  1.061231
10   1  0.002796  2005-11-25  1.058137
11   1  0.002796  2005-11-26  0.999380
12   1  0.002796  2005-11-27  0.990504
13   1  0.002796  2005-11-28  0.993764
14   1  0.002796  2005-11-29  0.978754
15   1  0.002796  2005-11-30  0.992070

In [197]: x['Rt'] = x['price'].pct_change()

In [198]: x
Out[198]:
    id    weight        date     price        Rt
0    1  0.002796  2005-11-15  0.998298       NaN
1    1  0.002796  2005-11-16  1.014242  0.015971
2    1  0.002796  2005-11-17  1.016452  0.002179
3    1  0.002796  2005-11-18  1.026396  0.009783
4    1  0.002796  2005-11-19  1.026047 -0.000340
5    1  0.002796  2005-11-20  1.024285 -0.001717
6    1  0.002796  2005-11-21  1.018764 -0.005390
7    1  0.002796  2005-11-22  1.033175  0.014146
8    1  0.002796  2005-11-23  1.058509  0.024521
9    1  0.002796  2005-11-24  1.061231  0.002572
10   1  0.002796  2005-11-25  1.058137 -0.002915
11   1  0.002796  2005-11-26  0.999380 -0.055529
12   1  0.002796  2005-11-27  0.990504 -0.008882
13   1  0.002796  2005-11-28  0.993764  0.003291
14   1  0.002796  2005-11-29  0.978754 -0.015104
15   1  0.002796  2005-11-30  0.992070  0.013605

另一种解决方案(使用shift()方法):

In [199]: x['Rt2'] = x['price'] / x['price'].shift() - 1

In [200]: x
Out[200]:
    id    weight        date     price        Rt       Rt2
0    1  0.002796  2005-11-15  0.998298       NaN       NaN
1    1  0.002796  2005-11-16  1.014242  0.015971  0.015971
2    1  0.002796  2005-11-17  1.016452  0.002179  0.002179
3    1  0.002796  2005-11-18  1.026396  0.009783  0.009783
4    1  0.002796  2005-11-19  1.026047 -0.000340 -0.000340
5    1  0.002796  2005-11-20  1.024285 -0.001717 -0.001717
6    1  0.002796  2005-11-21  1.018764 -0.005390 -0.005390
7    1  0.002796  2005-11-22  1.033175  0.014146  0.014146
8    1  0.002796  2005-11-23  1.058509  0.024521  0.024521
9    1  0.002796  2005-11-24  1.061231  0.002572  0.002572
10   1  0.002796  2005-11-25  1.058137 -0.002915 -0.002915
11   1  0.002796  2005-11-26  0.999380 -0.055529 -0.055529
12   1  0.002796  2005-11-27  0.990504 -0.008882 -0.008882
13   1  0.002796  2005-11-28  0.993764  0.003291  0.003291
14   1  0.002796  2005-11-29  0.978754 -0.015104 -0.015104
15   1  0.002796  2005-11-30  0.992070  0.013605  0.013605

答案 2 :(得分:2)

100

开始生成综合索引
df['PrcRtnIdx'] = df.price.pct_change().fillna(0).add(1).cumprod().mul(100)

print(df)

    id    weight        date     price   PrcRtnIdx
0    1  0.002796  2005-11-15  0.998298  100.000000
1    1  0.002796  2005-11-16  1.014242  101.597118
2    1  0.002796  2005-11-17  1.016452  101.818495
3    1  0.002796  2005-11-18  1.026396  102.814590
4    1  0.002796  2005-11-19  1.026047  102.779631
5    1  0.002796  2005-11-20  1.024285  102.603131
6    1  0.002796  2005-11-21  1.018764  102.050089
7    1  0.002796  2005-11-22  1.033175  103.493646
8    1  0.002796  2005-11-23  1.058509  106.031365
9    1  0.002796  2005-11-24  1.061231  106.304029
10   1  0.002796  2005-11-25  1.058137  105.994102
11   1  0.002796  2005-11-26  0.999380  100.108384
12   1  0.002796  2005-11-27  0.990504   99.219271
13   1  0.002796  2005-11-28  0.993764   99.545827
14   1  0.002796  2005-11-29  0.978754   98.042268
15   1  0.002796  2005-11-30  0.992070   99.376138

对@mydaemon的回应

df['PrcRtnIdx'] = \
    df.groupby('id').price.apply(
        lambda x: x.pct_change().fillna(0).add(1).cumprod().mul(100))