根据pct_change和熊猫中的先前值来计算组的多个当前值

时间:2019-12-24 07:29:45

标签: python-3.x pandas dataframe

对于以下数据框,如果value等于predicted,我想重新计算1,它将基于当前日期的pct和{{1} }。

value

我尝试使用以下代码,但是结果似乎与我通过excel公式计算的结果不同:

   city district     date  value       pct  predicted
0     a        c  2019-09   9.48  0.004237          0
1     a        c  2019-10   9.35 -0.013713          0
2     a        c  2019-11   9.05 -0.032086          0
3     a        c  2019-12   9.04 -0.001105          1    --> need to recalculate values based on  pct and previous values
4     a        c  2020-01   8.80 -0.020000          1    --> need to recalculate values based on  pct and previous values
5     a        c  2020-02   8.91  0.012500          1    --> need to recalculate values based on  pct and previous values
6     b        d  2019-09   9.48  0.004237          0
7     b        d  2019-10   9.35 -0.013713          0
8     b        d  2019-11   9.05 -0.032086          0
9     b        d  2019-12   9.04 -0.001105          1    --> need to recalculate values based on  pct and previous values
10    b        d  2020-01   8.80 -0.020000          1   --> need to recalculate values based on  pct and previous values
11    b        d  2020-02   8.91  0.012500          1   --> need to recalculate values based on  pct and previous values

输出:

df.loc[df["predicted"]==1, "value"] = np.nan
df['value'] = df['value'].ffill().mul(df['pct']).add(df['value'].ffill(), fill_value=0)
print(df)

我在 district date value pct predicted 0 c 2019-09 9.520169 0.004237 0 1 c 2019-10 9.221783 -0.013713 0 2 c 2019-11 8.759626 -0.032086 0 3 c 2019-12 9.040000 -0.001105 1 4 c 2020-01 8.869000 -0.020000 1 5 c 2020-02 9.163125 0.012500 1 6 d 2019-09 9.520169 0.004237 0 7 d 2019-10 9.221783 -0.013713 0 8 d 2019-11 8.759626 -0.032086 0 9 d 2019-12 9.040000 -0.001105 1 10 d 2020-01 8.869000 -0.020000 1 11 d 2020-02 9.163125 0.012500 1 中用于计算value的公式:2019-12中的value =(2019-12中的(1 + pct)** 2019-12中的{1}},其他月份的逻辑相同。

value

如何纠正我的代码?谢谢。

已更新:

df:

2019-11

运行以下代码后:

   district     date    value       pct  predicted
0         c  2019-09  9.48000  0.004237          0
1         c  2019-10  9.35000 -0.013713          0
2         c  2019-11  9.05000 -0.032086          0
3         c  2019-12  9.04000 -0.001105          1
4         c  2020-01  8.85920 -0.020000          1
5         c  2020-02  8.96994  0.012500          1
6         d  2019-09  9.48000  0.004237          0
7         d  2019-10  9.35000 -0.013713          0
8         d  2019-11  9.05000 -0.032086          0
9         d  2019-12  9.04000 -0.001105          1
10        d  2020-01  8.85920 -0.020000          1
11        d  2020-02  8.96994  0.012500          1

通常 city district date value pct predicted 0 a c 2018-12 10.1700 NaN 0 1 a c 2019-01 9.9900 -0.017699 0 2 a c 2019-02 10.6600 0.067067 0 3 a c 2019-03 10.5600 -0.009381 0 4 a c 2019-04 10.0600 -0.047348 0 5 a c 2019-05 10.6900 0.062624 0 6 a c 2019-06 10.7700 0.007484 0 7 a c 2019-07 10.6700 -0.009285 0 8 a c 2019-08 10.5100 -0.014995 0 9 a c 2019-09 10.2800 -0.021884 0 10 a c 2019-10 10.0500 -0.022374 0 11 a c 2019-11 9.7200 -0.032836 0 12 a c 2019-12 9.8400 0.012346 1 13 a c 2020-01 10.0368 0.020000 1 14 a c 2020-02 10.3500 -0.004808 1 15 a c 2020-03 10.1430 -0.020000 1 16 a c 2020-04 9.8882 -0.020000 1 17 a c 2020-05 9.5256 -0.020000 1 18 a c 2020-06 8.9572 -0.020000 1 19 a c 2020-07 9.0882 0.020000 1 20 a c 2020-08 9.3024 0.020000 1 21 a c 2020-09 9.9042 0.020000 1 22 a c 2020-10 10.1000 -0.001976 1 23 a c 2020-11 9.8980 -0.020000 1 24 b d 2018-12 6.3200 NaN 0 25 b d 2019-01 6.3200 0.000000 0 26 b d 2019-02 6.3200 0.000000 0 27 b d 2019-03 6.3200 0.000000 0 28 b d 2019-04 6.3200 0.000000 0 29 b d 2019-05 6.3200 0.000000 0 30 b d 2019-06 6.0000 -0.050633 0 31 b d 2019-07 6.0000 0.000000 0 32 b d 2019-08 6.0000 0.000000 0 33 b d 2019-09 6.0000 0.000000 0 34 b d 2019-10 6.0000 0.000000 0 35 b d 2019-11 6.0000 0.000000 0 36 b d 2019-12 5.7800 -0.020000 1 37 b d 2020-01 5.8956 0.020000 1 38 b d 2020-02 5.7820 -0.020000 1 39 b d 2020-03 5.7936 0.020000 1 40 b d 2020-04 5.7428 -0.020000 1 41 b d 2020-05 5.7222 0.020000 1 42 b d 2020-06 5.7428 -0.020000 1 43 b d 2020-07 5.5386 0.020000 1 44 b d 2020-08 5.7820 -0.020000 1 45 b d 2020-09 5.3142 0.020000 1 46 b d 2020-10 5.8898 -0.020000 1 47 b d 2020-11 5.0490 0.020000 1 m = df["predicted"]==1 s = df[m].groupby('district')['value'].shift() df['value'] = (1 + df['pct']).mul(s).fillna(df['value']) df['new_pct'] = df.groupby('city')['value'].apply(lambda x: x.pct_change()) print(df) 列应具有相同的值,但是对于某些行,它们是不同的。

pct

参考链接: Caculate current values based on pct_change and previous values in Pandas

2 个答案:

答案 0 :(得分:1)

我认为您可以使用:

df['value'] = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value'])
print(df)
   city district     date     value       pct  predicted
0     a        c  2019-09  9.480000  0.004237          0
1     a        c  2019-10  9.350001 -0.013713          0
2     a        c  2019-11  9.049996 -0.032086          0
3     a        c  2019-12  9.040000 -0.001105          1
4     a        c  2020-01  8.859200 -0.020000          1
5     a        c  2020-02  8.910000  0.012500          1
6     b        d  2019-09  9.480000  0.004237          0
7     b        d  2019-10  9.350001 -0.013713          0
8     b        d  2019-11  9.049996 -0.032086          0
9     b        d  2019-12  9.040000 -0.001105          1
10    b        d  2020-01  8.859200 -0.020000          1
11    b        d  2020-02  8.910000  0.012500          1

工作方式:

您可以将先前日期的每个组的值移动DataFrameGroupBy.shift,并在1上添加pct将多个组的值移动,最后用fillna将组的第一个值替换为原始组:< / p>

df = df.assign(add = (1 + df['pct']),
               shifted=df.groupby('district')['value'].shift(),
               mult = (1 + df['pct']).mul(df.groupby('district')['value'].shift()),
               fin = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value']))
print(df)
   city district     date  value       pct  predicted       add  shifted  \
0     a        c  2019-09   9.48  0.004237          0  1.004237      NaN   
1     a        c  2019-10   9.35 -0.013713          0  0.986287     9.48   
2     a        c  2019-11   9.05 -0.032086          0  0.967914     9.35   
3     a        c  2019-12   9.04 -0.001105          1  0.998895     9.05   
4     a        c  2020-01   8.80 -0.020000          1  0.980000     9.04   
5     a        c  2020-02   8.91  0.012500          1  1.012500     8.80   
6     b        d  2019-09   9.48  0.004237          0  1.004237      NaN   
7     b        d  2019-10   9.35 -0.013713          0  0.986287     9.48   
8     b        d  2019-11   9.05 -0.032086          0  0.967914     9.35   
9     b        d  2019-12   9.04 -0.001105          1  0.998895     9.05   
10    b        d  2020-01   8.80 -0.020000          1  0.980000     9.04   
11    b        d  2020-02   8.91  0.012500          1  1.012500     8.80   

        mult       fin  
0        NaN  9.480000  
1   9.350001  9.350001  
2   9.049996  9.049996  
3   9.040000  9.040000  
4   8.859200  8.859200  
5   8.910000  8.910000  
6        NaN  9.480000  
7   9.350001  9.350001  
8   9.049996  9.049996  
9   9.040000  9.040000  
10  8.859200  8.859200  
11  8.910000  8.910000  

如果ant仅按条件处理行:

m = df["predicted"]==1
s = df[m].groupby('district')['value'].shift()
df['value'] = (1 + df['pct']).mul(s).fillna(df['value'])
print(df)
   city district     date   value       pct  predicted
0     a        c  2019-09  9.4800  0.004237          0
1     a        c  2019-10  9.3500 -0.013713          0
2     a        c  2019-11  9.0500 -0.032086          0
3     a        c  2019-12  9.0400 -0.001105          1
4     a        c  2020-01  8.8592 -0.020000          1
5     a        c  2020-02  8.9100  0.012500          1
6     b        d  2019-09  9.4800  0.004237          0
7     b        d  2019-10  9.3500 -0.013713          0
8     b        d  2019-11  9.0500 -0.032086          0
9     b        d  2019-12  9.0400 -0.001105          1
10    b        d  2020-01  8.8592 -0.020000          1
11    b        d  2020-02  8.9100  0.012500          1

答案 1 :(得分:0)

这似乎解决了问题:

df.loc[df["predicted"]==1, "value"] = np.nan
while len(df.loc[df['value'].isin(['', np.nan])]) > 0:
    df['value'] = (1 + df['pct']).mul(df.groupby('district')['value'].shift()).fillna(df['value'])
df['new_pct'] = df.groupby('district')['value'].apply(lambda x: x.pct_change())

print(df)

输出:

   city district     date      value       pct  predicted   new_pct
0     a        c  2018-12  10.170000       NaN          0       NaN
1     a        c  2019-01   9.990001 -0.017699          0 -0.017699
2     a        c  2019-02  10.660001  0.067067          0  0.067067
3     a        c  2019-03  10.559999 -0.009381          0 -0.009381
4     a        c  2019-04  10.060004 -0.047348          0 -0.047348
5     a        c  2019-05  10.690002  0.062624          0  0.062624
6     a        c  2019-06  10.770006  0.007484          0  0.007484
7     a        c  2019-07  10.670006 -0.009285          0 -0.009285
8     a        c  2019-08  10.510010 -0.014995          0 -0.014995
9     a        c  2019-09  10.280009 -0.021884          0 -0.021884
10    a        c  2019-10  10.050004 -0.022374          0 -0.022374
11    a        c  2019-11   9.720002 -0.032836          0 -0.032836
12    a        c  2019-12   9.840005  0.012346          1  0.012346
13    a        c  2020-01  10.036804  0.020000          1  0.020000
14    a        c  2020-02   9.988548 -0.004808          1 -0.004808
15    a        c  2020-03   9.788778 -0.020000          1 -0.020000
16    a        c  2020-04   9.592998 -0.020000          1 -0.020000
17    a        c  2020-05   9.401140 -0.020000          1 -0.020000
18    a        c  2020-06   9.213114 -0.020000          1 -0.020000
19    a        c  2020-07   9.397375  0.020000          1  0.020000
20    a        c  2020-08   9.585320  0.020000          1  0.020000
21    a        c  2020-09   9.777027  0.020000          1  0.020000
22    a        c  2020-10   9.757712 -0.001976          1 -0.001976
23    a        c  2020-11   9.562560 -0.020000          1 -0.020000
24    b        d  2018-12   6.320000       NaN          0       NaN
25    b        d  2019-01   6.320000  0.000000          0  0.000000
26    b        d  2019-02   6.320000  0.000000          0  0.000000
27    b        d  2019-03   6.320000  0.000000          0  0.000000
28    b        d  2019-04   6.320000  0.000000          0  0.000000
29    b        d  2019-05   6.320000  0.000000          0  0.000000
30    b        d  2019-06   5.999999 -0.050633          0 -0.050633
31    b        d  2019-07   5.999999  0.000000          0  0.000000
32    b        d  2019-08   5.999999  0.000000          0  0.000000
33    b        d  2019-09   5.999999  0.000000          0  0.000000
34    b        d  2019-10   5.999999  0.000000          0  0.000000
35    b        d  2019-11   5.999999  0.000000          0  0.000000
36    b        d  2019-12   5.879999 -0.020000          1 -0.020000
37    b        d  2020-01   5.997599  0.020000          1  0.020000
38    b        d  2020-02   5.877647 -0.020000          1 -0.020000
39    b        d  2020-03   5.995200  0.020000          1  0.020000
40    b        d  2020-04   5.875296 -0.020000          1 -0.020000
41    b        d  2020-05   5.992802  0.020000          1  0.020000
42    b        d  2020-06   5.872947 -0.020000          1 -0.020000
43    b        d  2020-07   5.990406  0.020000          1  0.020000
44    b        d  2020-08   5.870598 -0.020000          1 -0.020000
45    b        d  2020-09   5.988010  0.020000          1  0.020000
46    b        d  2020-10   5.868249 -0.020000          1 -0.020000
47    b        d  2020-11   5.985614  0.020000          1  0.020000