我是Pandas的新手,作为练习,我正在移动一些旧的代码/解决方案来学习它。在这种情况下,我正在尝试计算产品价格的综合指数,该指数在用SQL计算之前。
这是我在数据框中的数据:
id weight date price
0 1 0.002796 2005-11-15 0.998298
1 1 0.002796 2005-11-16 1.014242
2 1 0.002796 2005-11-17 1.016452
3 1 0.002796 2005-11-18 1.026396
4 1 0.002796 2005-11-19 1.026047
5 1 0.002796 2005-11-20 1.024285
6 1 0.002796 2005-11-21 1.018764
7 1 0.002796 2005-11-22 1.033175
8 1 0.002796 2005-11-23 1.058509
9 1 0.002796 2005-11-24 1.061231
10 1 0.002796 2005-11-25 1.058137
11 1 0.002796 2005-11-26 0.999380
12 1 0.002796 2005-11-27 0.990504
13 1 0.002796 2005-11-28 0.993764
14 1 0.002796 2005-11-29 0.978754
15 1 0.002796 2005-11-30 0.992070
... ... ... ... ...
4085 1 0.002796 2017-01-21 0.857420
4086 1 0.002796 2017-01-22 0.848195
4087 1 0.002796 2017-01-23 0.791784
4088 1 0.002796 2017-01-24 0.846603
4089 1 0.002796 2017-01-25 0.878104
4090 1 0.002796 2017-01-26 0.806651
4091 1 0.002796 2017-01-27 0.849316
4092 1 0.002796 2017-01-28 0.826550
4093 1 0.002796 2017-01-29 0.848651
4094 1 0.002796 2017-01-30 0.829643
4095 1 0.002796 2017-01-31 0.837094
4096 1 0.002796 2017-02-01 0.846572
4097 1 0.002796 2017-02-02 0.800163
4098 1 0.002796 2017-02-03 0.820356
4099 1 0.002796 2017-02-04 0.818924
4100 1 0.002796 2017-02-05 0.822157
4101 1 0.002796 2017-02-06 0.787123
4102 1 0.002796 2017-02-07 0.796264
4103 1 0.002796 2017-02-08 0.797241
4104 1 0.002796 2017-02-09 0.818499
4105 1 0.002796 2017-02-10 0.810928
综合指数以每日回报计算,是当天的回报:
Rt =(Price_day / Price_day_before) - 1
我一直在读大熊猫,时间序列等,但我很难理解在这里执行的具体操作;这是一个滚动?如何获取给定日期和日期之前的数据?
答案 0 :(得分:3)
代码是:
df['return'] = df['price']/df['price'].shift(1) - 1
python中的df.shift(1)
函数在SQL中用作LEAD()
或LEG()
函数。
答案 1 :(得分:2)
您希望使用pct_change()方法的IIUC:
In [196]: x
Out[196]:
id weight date price
0 1 0.002796 2005-11-15 0.998298
1 1 0.002796 2005-11-16 1.014242
2 1 0.002796 2005-11-17 1.016452
3 1 0.002796 2005-11-18 1.026396
4 1 0.002796 2005-11-19 1.026047
5 1 0.002796 2005-11-20 1.024285
6 1 0.002796 2005-11-21 1.018764
7 1 0.002796 2005-11-22 1.033175
8 1 0.002796 2005-11-23 1.058509
9 1 0.002796 2005-11-24 1.061231
10 1 0.002796 2005-11-25 1.058137
11 1 0.002796 2005-11-26 0.999380
12 1 0.002796 2005-11-27 0.990504
13 1 0.002796 2005-11-28 0.993764
14 1 0.002796 2005-11-29 0.978754
15 1 0.002796 2005-11-30 0.992070
In [197]: x['Rt'] = x['price'].pct_change()
In [198]: x
Out[198]:
id weight date price Rt
0 1 0.002796 2005-11-15 0.998298 NaN
1 1 0.002796 2005-11-16 1.014242 0.015971
2 1 0.002796 2005-11-17 1.016452 0.002179
3 1 0.002796 2005-11-18 1.026396 0.009783
4 1 0.002796 2005-11-19 1.026047 -0.000340
5 1 0.002796 2005-11-20 1.024285 -0.001717
6 1 0.002796 2005-11-21 1.018764 -0.005390
7 1 0.002796 2005-11-22 1.033175 0.014146
8 1 0.002796 2005-11-23 1.058509 0.024521
9 1 0.002796 2005-11-24 1.061231 0.002572
10 1 0.002796 2005-11-25 1.058137 -0.002915
11 1 0.002796 2005-11-26 0.999380 -0.055529
12 1 0.002796 2005-11-27 0.990504 -0.008882
13 1 0.002796 2005-11-28 0.993764 0.003291
14 1 0.002796 2005-11-29 0.978754 -0.015104
15 1 0.002796 2005-11-30 0.992070 0.013605
另一种解决方案(使用shift()方法):
In [199]: x['Rt2'] = x['price'] / x['price'].shift() - 1
In [200]: x
Out[200]:
id weight date price Rt Rt2
0 1 0.002796 2005-11-15 0.998298 NaN NaN
1 1 0.002796 2005-11-16 1.014242 0.015971 0.015971
2 1 0.002796 2005-11-17 1.016452 0.002179 0.002179
3 1 0.002796 2005-11-18 1.026396 0.009783 0.009783
4 1 0.002796 2005-11-19 1.026047 -0.000340 -0.000340
5 1 0.002796 2005-11-20 1.024285 -0.001717 -0.001717
6 1 0.002796 2005-11-21 1.018764 -0.005390 -0.005390
7 1 0.002796 2005-11-22 1.033175 0.014146 0.014146
8 1 0.002796 2005-11-23 1.058509 0.024521 0.024521
9 1 0.002796 2005-11-24 1.061231 0.002572 0.002572
10 1 0.002796 2005-11-25 1.058137 -0.002915 -0.002915
11 1 0.002796 2005-11-26 0.999380 -0.055529 -0.055529
12 1 0.002796 2005-11-27 0.990504 -0.008882 -0.008882
13 1 0.002796 2005-11-28 0.993764 0.003291 0.003291
14 1 0.002796 2005-11-29 0.978754 -0.015104 -0.015104
15 1 0.002796 2005-11-30 0.992070 0.013605 0.013605
答案 2 :(得分:2)
从100
df['PrcRtnIdx'] = df.price.pct_change().fillna(0).add(1).cumprod().mul(100)
print(df)
id weight date price PrcRtnIdx
0 1 0.002796 2005-11-15 0.998298 100.000000
1 1 0.002796 2005-11-16 1.014242 101.597118
2 1 0.002796 2005-11-17 1.016452 101.818495
3 1 0.002796 2005-11-18 1.026396 102.814590
4 1 0.002796 2005-11-19 1.026047 102.779631
5 1 0.002796 2005-11-20 1.024285 102.603131
6 1 0.002796 2005-11-21 1.018764 102.050089
7 1 0.002796 2005-11-22 1.033175 103.493646
8 1 0.002796 2005-11-23 1.058509 106.031365
9 1 0.002796 2005-11-24 1.061231 106.304029
10 1 0.002796 2005-11-25 1.058137 105.994102
11 1 0.002796 2005-11-26 0.999380 100.108384
12 1 0.002796 2005-11-27 0.990504 99.219271
13 1 0.002796 2005-11-28 0.993764 99.545827
14 1 0.002796 2005-11-29 0.978754 98.042268
15 1 0.002796 2005-11-30 0.992070 99.376138
对@mydaemon的回应
df['PrcRtnIdx'] = \
df.groupby('id').price.apply(
lambda x: x.pct_change().fillna(0).add(1).cumprod().mul(100))