我有一个像df1
这样的数据框,其中有四列,假设所有city
的日期范围从2019-01-01
到2019-07-01
,我想分组city
并根据price
和2019-07-01
中的值计算pct_change
:
city date price pct_change
0 bj 2019-01-01 NaN NaN
1 bj 2019-02-01 NaN -0.03
2 bj 2019-03-01 NaN 0.16
3 bj 2019-04-01 NaN 0.07
4 bj 2019-05-01 NaN 0.19
5 bj 2019-06-01 NaN -0.05
6 bj 2019-07-01 6.0 -0.02
7 gz 2019-01-01 NaN NaN
8 gz 2019-02-01 NaN 0.03
9 gz 2019-03-01 NaN 0.00
10 gz 2019-04-01 NaN 0.03
11 gz 2019-05-01 NaN 0.00
12 gz 2019-06-01 NaN 0.06
13 gz 2019-07-01 NaN 0.07
14 gz 2019-08-01 8.9 -0.02
15 sh 2019-02-01 NaN 0.04
16 sh 2019-03-01 NaN -0.04
17 sh 2019-04-01 NaN -0.04
18 sh 2019-05-01 NaN -0.04
19 sh 2019-06-01 NaN -0.04
20 sh 2019-07-01 NaN -0.01
21 sh 2019-08-01 7.5 -0.01
22 sz 2019-02-01 NaN -0.03
23 sz 2019-03-01 NaN 0.10
24 sz 2019-04-01 NaN -0.04
25 sz 2019-05-01 NaN -0.16
26 sz 2019-06-01 NaN 0.12
27 sz 2019-07-01 7.0 0.00
例如,在Excel中,我可以按row
反算第五个price
的{{1}}值,按6.0/(1+(-0.02)) = 6.12
反算第四个price
,依此类推
是否有可能像Python中的6.12/(1+(-0.05)) = 6.44
那样获得预期的结果(不一定完全相同)?
df2
请注意我是否有 city date price pct_change
0 bj 2019-01-01 4.49 -0.03
1 bj 2019-02-01 4.34 0.16
2 bj 2019-03-01 5.04 0.07
3 bj 2019-04-01 5.39 0.19
4 bj 2019-05-01 6.43 -0.05
5 bj 2019-06-01 6.11 -0.02
6 bj 2019-07-01 6.00 0.05
7 gz 2019-01-01 7.58 0.03
8 gz 2019-02-01 7.79 0.00
9 gz 2019-03-01 7.80 0.03
10 gz 2019-04-01 8.04 0.00
11 gz 2019-05-01 8.04 0.06
12 gz 2019-06-01 8.52 0.07
13 gz 2019-07-01 9.10 -0.02
14 gz 2019-08-01 8.90 0.00
15 sh 2019-01-01 8.81 0.04
16 sh 2019-02-01 9.16 0.02
17 sh 2019-03-01 8.79 -0.04
18 sh 2019-04-01 8.43 -0.12
19 sh 2019-05-01 8.06 -0.04
20 sh 2019-06-01 7.70 0.07
21 sh 2019-07-01 7.60 -0.01
22 sh 2019-08-01 7.50 0.06
23 sz 2019-01-01 7.30 -0.03
24 sz 2019-02-01 7.10 0.10
25 sz 2019-03-01 7.80 -0.04
26 sz 2019-04-01 7.45 -0.16
27 sz 2019-05-01 6.28 0.12
28 sz 2019-06-01 7.02 0.00
29 sz 2019-07-01 7.00 -0.04
,如下所示:
df3
我可以通过以下代码获得 city date price
0 bj 2019-01-01 4.49
1 bj 2019-02-01 4.34
2 bj 2019-03-01 5.04
3 bj 2019-04-01 5.39
4 bj 2019-05-01 6.43
5 bj 2019-06-01 6.11
6 bj 2019-07-01 6.00
7 gz 2019-01-01 7.58
8 gz 2019-02-01 7.79
9 gz 2019-03-01 7.80
10 gz 2019-04-01 8.04
11 gz 2019-05-01 8.04
12 gz 2019-06-01 8.52
13 gz 2019-07-01 9.10
14 gz 2019-08-01 8.90
15 sh 2019-01-01 8.81
16 sh 2019-02-01 9.16
17 sh 2019-03-01 8.79
18 sh 2019-04-01 8.43
19 sh 2019-05-01 8.06
20 sh 2019-06-01 7.70
21 sh 2019-07-01 7.60
22 sh 2019-08-01 7.50
23 sz 2019-01-01 7.30
24 sz 2019-02-01 7.10
25 sz 2019-03-01 7.80
26 sz 2019-04-01 7.45
27 sz 2019-05-01 6.28
28 sz 2019-06-01 7.02
29 sz 2019-07-01 7.00
:
df2
感谢您的帮助。
答案 0 :(得分:1)
您可以在groupby
中使用cumprod
,但是需要使用[::-1]
将数据反转两次,例如:
df1['estimate_price'] = (df1.fillna({'price':1})\ #for later multiplication
.groupby('city')\
.apply(lambda x: (x['price']/(1 + x['pct_change'].shift(-1).fillna(0)))[::-1].cumprod()[::-1])\
.reset_index(level=0, drop=True))#to use index alignment
print (df1)
city date price pct_change estimate_price
0 bj 2019-01-01 NaN NaN 4.498224
1 bj 2019-02-01 NaN -0.03 4.363278
2 bj 2019-03-01 NaN 0.16 5.061402
3 bj 2019-04-01 NaN 0.07 5.415700
4 bj 2019-05-01 NaN 0.19 6.444683
5 bj 2019-06-01 NaN -0.05 6.122449
6 bj 2019-07-01 6.0 -0.02 6.000000
7 gz 2019-01-01 NaN NaN 7.547443
8 gz 2019-02-01 NaN 0.03 7.773866
9 gz 2019-03-01 NaN 0.00 7.773866
10 gz 2019-04-01 NaN 0.03 8.007082
11 gz 2019-05-01 NaN 0.00 8.007082
12 gz 2019-06-01 NaN 0.06 8.487507
13 gz 2019-07-01 NaN 0.07 9.081633
14 gz 2019-08-01 8.9 -0.02 8.900000
15 sh 2019-02-01 NaN 0.04 9.009609
16 sh 2019-03-01 NaN -0.04 8.649225
17 sh 2019-04-01 NaN -0.04 8.303256
18 sh 2019-05-01 NaN -0.04 7.971125
19 sh 2019-06-01 NaN -0.04 7.652280
20 sh 2019-07-01 NaN -0.01 7.575758
21 sh 2019-08-01 7.5 -0.01 7.500000
22 sz 2019-02-01 NaN -0.03 7.045905
23 sz 2019-03-01 NaN 0.10 7.750496
24 sz 2019-04-01 NaN -0.04 7.440476
25 sz 2019-05-01 NaN -0.16 6.250000
26 sz 2019-06-01 NaN 0.12 7.000000
27 sz 2019-07-01 7.0 0.00 7.000000