基于Python中pct_change的一列的Groupby和反向计算

时间:2019-08-30 14:30:05

标签: python pandas numpy

我有一个像df1这样的数据框,其中有四列,假设所有city的日期范围从2019-01-012019-07-01,我想分组city并根据price2019-07-01中的值计算pct_change

   city        date  price  pct_change
0    bj  2019-01-01    NaN         NaN
1    bj  2019-02-01    NaN       -0.03
2    bj  2019-03-01    NaN        0.16
3    bj  2019-04-01    NaN        0.07
4    bj  2019-05-01    NaN        0.19
5    bj  2019-06-01    NaN       -0.05
6    bj  2019-07-01    6.0       -0.02
7    gz  2019-01-01    NaN         NaN
8    gz  2019-02-01    NaN        0.03
9    gz  2019-03-01    NaN        0.00
10   gz  2019-04-01    NaN        0.03
11   gz  2019-05-01    NaN        0.00
12   gz  2019-06-01    NaN        0.06
13   gz  2019-07-01    NaN        0.07
14   gz  2019-08-01    8.9       -0.02
15   sh  2019-02-01    NaN        0.04
16   sh  2019-03-01    NaN       -0.04
17   sh  2019-04-01    NaN       -0.04
18   sh  2019-05-01    NaN       -0.04
19   sh  2019-06-01    NaN       -0.04
20   sh  2019-07-01    NaN       -0.01
21   sh  2019-08-01    7.5       -0.01
22   sz  2019-02-01    NaN       -0.03
23   sz  2019-03-01    NaN        0.10
24   sz  2019-04-01    NaN       -0.04
25   sz  2019-05-01    NaN       -0.16
26   sz  2019-06-01    NaN        0.12
27   sz  2019-07-01    7.0        0.00

例如,在Excel中,我可以按row反算第五个price的{​​{1}}值,按6.0/(1+(-0.02)) = 6.12反算第四个price,依此类推

是否有可能像Python中的6.12/(1+(-0.05)) = 6.44那样获得预期的结果(不一定完全相同)?

df2

请注意我是否有 city date price pct_change 0 bj 2019-01-01 4.49 -0.03 1 bj 2019-02-01 4.34 0.16 2 bj 2019-03-01 5.04 0.07 3 bj 2019-04-01 5.39 0.19 4 bj 2019-05-01 6.43 -0.05 5 bj 2019-06-01 6.11 -0.02 6 bj 2019-07-01 6.00 0.05 7 gz 2019-01-01 7.58 0.03 8 gz 2019-02-01 7.79 0.00 9 gz 2019-03-01 7.80 0.03 10 gz 2019-04-01 8.04 0.00 11 gz 2019-05-01 8.04 0.06 12 gz 2019-06-01 8.52 0.07 13 gz 2019-07-01 9.10 -0.02 14 gz 2019-08-01 8.90 0.00 15 sh 2019-01-01 8.81 0.04 16 sh 2019-02-01 9.16 0.02 17 sh 2019-03-01 8.79 -0.04 18 sh 2019-04-01 8.43 -0.12 19 sh 2019-05-01 8.06 -0.04 20 sh 2019-06-01 7.70 0.07 21 sh 2019-07-01 7.60 -0.01 22 sh 2019-08-01 7.50 0.06 23 sz 2019-01-01 7.30 -0.03 24 sz 2019-02-01 7.10 0.10 25 sz 2019-03-01 7.80 -0.04 26 sz 2019-04-01 7.45 -0.16 27 sz 2019-05-01 6.28 0.12 28 sz 2019-06-01 7.02 0.00 29 sz 2019-07-01 7.00 -0.04 ,如下所示:

df3

我可以通过以下代码获得 city date price 0 bj 2019-01-01 4.49 1 bj 2019-02-01 4.34 2 bj 2019-03-01 5.04 3 bj 2019-04-01 5.39 4 bj 2019-05-01 6.43 5 bj 2019-06-01 6.11 6 bj 2019-07-01 6.00 7 gz 2019-01-01 7.58 8 gz 2019-02-01 7.79 9 gz 2019-03-01 7.80 10 gz 2019-04-01 8.04 11 gz 2019-05-01 8.04 12 gz 2019-06-01 8.52 13 gz 2019-07-01 9.10 14 gz 2019-08-01 8.90 15 sh 2019-01-01 8.81 16 sh 2019-02-01 9.16 17 sh 2019-03-01 8.79 18 sh 2019-04-01 8.43 19 sh 2019-05-01 8.06 20 sh 2019-06-01 7.70 21 sh 2019-07-01 7.60 22 sh 2019-08-01 7.50 23 sz 2019-01-01 7.30 24 sz 2019-02-01 7.10 25 sz 2019-03-01 7.80 26 sz 2019-04-01 7.45 27 sz 2019-05-01 6.28 28 sz 2019-06-01 7.02 29 sz 2019-07-01 7.00

df2

感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

您可以在groupby中使用cumprod,但是需要使用[::-1]将数据反转两次,例如:

df1['estimate_price'] = (df1.fillna({'price':1})\ #for later multiplication
                            .groupby('city')\
                            .apply(lambda x: (x['price']/(1 + x['pct_change'].shift(-1).fillna(0)))[::-1].cumprod()[::-1])\
                             .reset_index(level=0, drop=True))#to use index alignment
print (df1)
   city        date  price  pct_change  estimate_price
0    bj  2019-01-01    NaN         NaN        4.498224
1    bj  2019-02-01    NaN       -0.03        4.363278
2    bj  2019-03-01    NaN        0.16        5.061402
3    bj  2019-04-01    NaN        0.07        5.415700
4    bj  2019-05-01    NaN        0.19        6.444683
5    bj  2019-06-01    NaN       -0.05        6.122449
6    bj  2019-07-01    6.0       -0.02        6.000000
7    gz  2019-01-01    NaN         NaN        7.547443
8    gz  2019-02-01    NaN        0.03        7.773866
9    gz  2019-03-01    NaN        0.00        7.773866
10   gz  2019-04-01    NaN        0.03        8.007082
11   gz  2019-05-01    NaN        0.00        8.007082
12   gz  2019-06-01    NaN        0.06        8.487507
13   gz  2019-07-01    NaN        0.07        9.081633
14   gz  2019-08-01    8.9       -0.02        8.900000
15   sh  2019-02-01    NaN        0.04        9.009609
16   sh  2019-03-01    NaN       -0.04        8.649225
17   sh  2019-04-01    NaN       -0.04        8.303256
18   sh  2019-05-01    NaN       -0.04        7.971125
19   sh  2019-06-01    NaN       -0.04        7.652280
20   sh  2019-07-01    NaN       -0.01        7.575758
21   sh  2019-08-01    7.5       -0.01        7.500000
22   sz  2019-02-01    NaN       -0.03        7.045905
23   sz  2019-03-01    NaN        0.10        7.750496
24   sz  2019-04-01    NaN       -0.04        7.440476
25   sz  2019-05-01    NaN       -0.16        6.250000
26   sz  2019-06-01    NaN        0.12        7.000000
27   sz  2019-07-01    7.0        0.00        7.000000