对于以下数据框,如果value
等于predicted
,我想重新计算1
,它将基于当前日期的pct
和{{1} }。
value
我尝试使用以下代码,但是结果似乎与我通过excel公式计算的结果不同:
city district date value pct predicted
0 a c 2019-09 9.48 0.004237 0
1 a c 2019-10 9.35 -0.013713 0
2 a c 2019-11 9.05 -0.032086 0
3 a c 2019-12 9.04 -0.001105 1 --> need to recalculate values based on pct and previous values
4 a c 2020-01 8.80 -0.020000 1 --> need to recalculate values based on pct and previous values
5 a c 2020-02 8.91 0.012500 1 --> need to recalculate values based on pct and previous values
6 b d 2019-09 9.48 0.004237 0
7 b d 2019-10 9.35 -0.013713 0
8 b d 2019-11 9.05 -0.032086 0
9 b d 2019-12 9.04 -0.001105 1 --> need to recalculate values based on pct and previous values
10 b d 2020-01 8.80 -0.020000 1 --> need to recalculate values based on pct and previous values
11 b d 2020-02 8.91 0.012500 1 --> need to recalculate values based on pct and previous values
输出:
df.loc[df["predicted"]==1, "value"] = np.nan
df['value'] = df['value'].ffill().mul(df['pct']).add(df['value'].ffill(), fill_value=0)
print(df)
我在 district date value pct predicted
0 c 2019-09 9.520169 0.004237 0
1 c 2019-10 9.221783 -0.013713 0
2 c 2019-11 8.759626 -0.032086 0
3 c 2019-12 9.040000 -0.001105 1
4 c 2020-01 8.869000 -0.020000 1
5 c 2020-02 9.163125 0.012500 1
6 d 2019-09 9.520169 0.004237 0
7 d 2019-10 9.221783 -0.013713 0
8 d 2019-11 8.759626 -0.032086 0
9 d 2019-12 9.040000 -0.001105 1
10 d 2020-01 8.869000 -0.020000 1
11 d 2020-02 9.163125 0.012500 1
中用于计算value
的公式:2019-12
中的value
=(2019-12
中的(1 + pct
)** 2019-12
中的{1}},其他月份的逻辑相同。
value
如何纠正我的代码?谢谢。
已更新:
df:
2019-11
运行以下代码后:
district date value pct predicted
0 c 2019-09 9.48000 0.004237 0
1 c 2019-10 9.35000 -0.013713 0
2 c 2019-11 9.05000 -0.032086 0
3 c 2019-12 9.04000 -0.001105 1
4 c 2020-01 8.85920 -0.020000 1
5 c 2020-02 8.96994 0.012500 1
6 d 2019-09 9.48000 0.004237 0
7 d 2019-10 9.35000 -0.013713 0
8 d 2019-11 9.05000 -0.032086 0
9 d 2019-12 9.04000 -0.001105 1
10 d 2020-01 8.85920 -0.020000 1
11 d 2020-02 8.96994 0.012500 1
通常 city district date value pct predicted
0 a c 2018-12 10.1700 NaN 0
1 a c 2019-01 9.9900 -0.017699 0
2 a c 2019-02 10.6600 0.067067 0
3 a c 2019-03 10.5600 -0.009381 0
4 a c 2019-04 10.0600 -0.047348 0
5 a c 2019-05 10.6900 0.062624 0
6 a c 2019-06 10.7700 0.007484 0
7 a c 2019-07 10.6700 -0.009285 0
8 a c 2019-08 10.5100 -0.014995 0
9 a c 2019-09 10.2800 -0.021884 0
10 a c 2019-10 10.0500 -0.022374 0
11 a c 2019-11 9.7200 -0.032836 0
12 a c 2019-12 9.8400 0.012346 1
13 a c 2020-01 10.0368 0.020000 1
14 a c 2020-02 10.3500 -0.004808 1
15 a c 2020-03 10.1430 -0.020000 1
16 a c 2020-04 9.8882 -0.020000 1
17 a c 2020-05 9.5256 -0.020000 1
18 a c 2020-06 8.9572 -0.020000 1
19 a c 2020-07 9.0882 0.020000 1
20 a c 2020-08 9.3024 0.020000 1
21 a c 2020-09 9.9042 0.020000 1
22 a c 2020-10 10.1000 -0.001976 1
23 a c 2020-11 9.8980 -0.020000 1
24 b d 2018-12 6.3200 NaN 0
25 b d 2019-01 6.3200 0.000000 0
26 b d 2019-02 6.3200 0.000000 0
27 b d 2019-03 6.3200 0.000000 0
28 b d 2019-04 6.3200 0.000000 0
29 b d 2019-05 6.3200 0.000000 0
30 b d 2019-06 6.0000 -0.050633 0
31 b d 2019-07 6.0000 0.000000 0
32 b d 2019-08 6.0000 0.000000 0
33 b d 2019-09 6.0000 0.000000 0
34 b d 2019-10 6.0000 0.000000 0
35 b d 2019-11 6.0000 0.000000 0
36 b d 2019-12 5.7800 -0.020000 1
37 b d 2020-01 5.8956 0.020000 1
38 b d 2020-02 5.7820 -0.020000 1
39 b d 2020-03 5.7936 0.020000 1
40 b d 2020-04 5.7428 -0.020000 1
41 b d 2020-05 5.7222 0.020000 1
42 b d 2020-06 5.7428 -0.020000 1
43 b d 2020-07 5.5386 0.020000 1
44 b d 2020-08 5.7820 -0.020000 1
45 b d 2020-09 5.3142 0.020000 1
46 b d 2020-10 5.8898 -0.020000 1
47 b d 2020-11 5.0490 0.020000 1
和m = df["predicted"]==1
s = df[m].groupby('district')['value'].shift()
df['value'] = (1 + df['pct']).mul(s).fillna(df['value'])
df['new_pct'] = df.groupby('city')['value'].apply(lambda x: x.pct_change())
print(df)
列应具有相同的值,但是对于某些行,它们是不同的。
pct
参考链接: Caculate current values based on pct_change and previous values in Pandas
答案 0 :(得分:1)
我认为您可以使用:
df['value'] = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value'])
print(df)
city district date value pct predicted
0 a c 2019-09 9.480000 0.004237 0
1 a c 2019-10 9.350001 -0.013713 0
2 a c 2019-11 9.049996 -0.032086 0
3 a c 2019-12 9.040000 -0.001105 1
4 a c 2020-01 8.859200 -0.020000 1
5 a c 2020-02 8.910000 0.012500 1
6 b d 2019-09 9.480000 0.004237 0
7 b d 2019-10 9.350001 -0.013713 0
8 b d 2019-11 9.049996 -0.032086 0
9 b d 2019-12 9.040000 -0.001105 1
10 b d 2020-01 8.859200 -0.020000 1
11 b d 2020-02 8.910000 0.012500 1
工作方式:
您可以将先前日期的每个组的值移动DataFrameGroupBy.shift
,并在1
上添加pct
将多个组的值移动,最后用fillna
将组的第一个值替换为原始组:< / p>
df = df.assign(add = (1 + df['pct']),
shifted=df.groupby('district')['value'].shift(),
mult = (1 + df['pct']).mul(df.groupby('district')['value'].shift()),
fin = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value']))
print(df)
city district date value pct predicted add shifted \
0 a c 2019-09 9.48 0.004237 0 1.004237 NaN
1 a c 2019-10 9.35 -0.013713 0 0.986287 9.48
2 a c 2019-11 9.05 -0.032086 0 0.967914 9.35
3 a c 2019-12 9.04 -0.001105 1 0.998895 9.05
4 a c 2020-01 8.80 -0.020000 1 0.980000 9.04
5 a c 2020-02 8.91 0.012500 1 1.012500 8.80
6 b d 2019-09 9.48 0.004237 0 1.004237 NaN
7 b d 2019-10 9.35 -0.013713 0 0.986287 9.48
8 b d 2019-11 9.05 -0.032086 0 0.967914 9.35
9 b d 2019-12 9.04 -0.001105 1 0.998895 9.05
10 b d 2020-01 8.80 -0.020000 1 0.980000 9.04
11 b d 2020-02 8.91 0.012500 1 1.012500 8.80
mult fin
0 NaN 9.480000
1 9.350001 9.350001
2 9.049996 9.049996
3 9.040000 9.040000
4 8.859200 8.859200
5 8.910000 8.910000
6 NaN 9.480000
7 9.350001 9.350001
8 9.049996 9.049996
9 9.040000 9.040000
10 8.859200 8.859200
11 8.910000 8.910000
如果ant仅按条件处理行:
m = df["predicted"]==1
s = df[m].groupby('district')['value'].shift()
df['value'] = (1 + df['pct']).mul(s).fillna(df['value'])
print(df)
city district date value pct predicted
0 a c 2019-09 9.4800 0.004237 0
1 a c 2019-10 9.3500 -0.013713 0
2 a c 2019-11 9.0500 -0.032086 0
3 a c 2019-12 9.0400 -0.001105 1
4 a c 2020-01 8.8592 -0.020000 1
5 a c 2020-02 8.9100 0.012500 1
6 b d 2019-09 9.4800 0.004237 0
7 b d 2019-10 9.3500 -0.013713 0
8 b d 2019-11 9.0500 -0.032086 0
9 b d 2019-12 9.0400 -0.001105 1
10 b d 2020-01 8.8592 -0.020000 1
11 b d 2020-02 8.9100 0.012500 1
答案 1 :(得分:0)
这似乎解决了问题:
df.loc[df["predicted"]==1, "value"] = np.nan
while len(df.loc[df['value'].isin(['', np.nan])]) > 0:
df['value'] = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value'])
df['new_pct'] = df.groupby('district')['value'].apply(lambda x: x.pct_change())
print(df)
输出:
city district date value pct predicted new_pct
0 a c 2018-12 10.170000 NaN 0 NaN
1 a c 2019-01 9.990001 -0.017699 0 -0.017699
2 a c 2019-02 10.660001 0.067067 0 0.067067
3 a c 2019-03 10.559999 -0.009381 0 -0.009381
4 a c 2019-04 10.060004 -0.047348 0 -0.047348
5 a c 2019-05 10.690002 0.062624 0 0.062624
6 a c 2019-06 10.770006 0.007484 0 0.007484
7 a c 2019-07 10.670006 -0.009285 0 -0.009285
8 a c 2019-08 10.510010 -0.014995 0 -0.014995
9 a c 2019-09 10.280009 -0.021884 0 -0.021884
10 a c 2019-10 10.050004 -0.022374 0 -0.022374
11 a c 2019-11 9.720002 -0.032836 0 -0.032836
12 a c 2019-12 9.840005 0.012346 1 0.012346
13 a c 2020-01 10.036804 0.020000 1 0.020000
14 a c 2020-02 9.988548 -0.004808 1 -0.004808
15 a c 2020-03 9.788778 -0.020000 1 -0.020000
16 a c 2020-04 9.592998 -0.020000 1 -0.020000
17 a c 2020-05 9.401140 -0.020000 1 -0.020000
18 a c 2020-06 9.213114 -0.020000 1 -0.020000
19 a c 2020-07 9.397375 0.020000 1 0.020000
20 a c 2020-08 9.585320 0.020000 1 0.020000
21 a c 2020-09 9.777027 0.020000 1 0.020000
22 a c 2020-10 9.757712 -0.001976 1 -0.001976
23 a c 2020-11 9.562560 -0.020000 1 -0.020000
24 b d 2018-12 6.320000 NaN 0 NaN
25 b d 2019-01 6.320000 0.000000 0 0.000000
26 b d 2019-02 6.320000 0.000000 0 0.000000
27 b d 2019-03 6.320000 0.000000 0 0.000000
28 b d 2019-04 6.320000 0.000000 0 0.000000
29 b d 2019-05 6.320000 0.000000 0 0.000000
30 b d 2019-06 5.999999 -0.050633 0 -0.050633
31 b d 2019-07 5.999999 0.000000 0 0.000000
32 b d 2019-08 5.999999 0.000000 0 0.000000
33 b d 2019-09 5.999999 0.000000 0 0.000000
34 b d 2019-10 5.999999 0.000000 0 0.000000
35 b d 2019-11 5.999999 0.000000 0 0.000000
36 b d 2019-12 5.879999 -0.020000 1 -0.020000
37 b d 2020-01 5.997599 0.020000 1 0.020000
38 b d 2020-02 5.877647 -0.020000 1 -0.020000
39 b d 2020-03 5.995200 0.020000 1 0.020000
40 b d 2020-04 5.875296 -0.020000 1 -0.020000
41 b d 2020-05 5.992802 0.020000 1 0.020000
42 b d 2020-06 5.872947 -0.020000 1 -0.020000
43 b d 2020-07 5.990406 0.020000 1 0.020000
44 b d 2020-08 5.870598 -0.020000 1 -0.020000
45 b d 2020-09 5.988010 0.020000 1 0.020000
46 b d 2020-10 5.868249 -0.020000 1 -0.020000
47 b d 2020-11 5.985614 0.020000 1 0.020000