计算大熊猫的尾随方差

时间:2018-10-30 06:38:22

标签: python pandas dataframe variance

我有一个如下数据框:

     | symbol |    date    |  close   
 ----|--------|------------|---------- 
   0 | APX    | 5/31/2017  |     4.04 
   1 | APX    | 6/30/2017  |      5.4 
   2 | APX    | 7/31/2017  |     4.15 
   3 | APX    | 8/31/2017  |     9.95 
   4 | APX    | 9/30/2017  |     10.3 
   5 | APX    | 10/31/2017 |     5.58 
   6 | APX    | 11/30/2017 |     8.47 
   7 | APX    | 12/31/2017 |    15.66 
   8 | APX    | 1/31/2018  |    10.55 
   9 | APX    | 2/28/2018  |      9.8 
  10 | APX    | 3/31/2018  |     7.43 
  11 | APX    | 4/30/2018  |     8.93 
  12 | APX    | 5/31/2018  |     7.61 
  13 | APX    | 6/30/2018  |     7.79 
  14 | AURA   | 1/31/2018  | 0.221382 
  15 | AURA   | 2/28/2018  | 0.222236 
  16 | AURA   | 3/31/2018  | 0.075488 
  17 | AURA   | 4/30/2018  | 0.180699 
  18 | AURA   | 5/31/2018  | 0.220009 
  19 | AURA   | 6/30/2018  | 0.199029 
  20 | BASH   | 11/30/2016 | 0.000447 
  21 | BASH   | 12/31/2016 | 0.000376 
  22 | BASH   | 1/31/2017  | 0.000452 
  23 | BASH   | 2/28/2017  | 0.000414 
  24 | BASH   | 3/31/2017  |  0.00045 
  25 | BASH   | 4/30/2017  | 0.000754 
  26 | BASH   | 5/31/2017  | 0.009115 
  27 | BASH   | 6/30/2017  |  0.03419 
  28 | BASH   | 7/31/2017  | 0.014037 
  29 | BASH   | 8/31/2017  | 0.009117 
  30 | BASH   | 9/30/2017  | 0.002333 
  31 | BASH   | 10/31/2017 |  0.00258 
  32 | BASH   | 11/30/2017 | 0.003415 
  33 | BASH   | 12/31/2017 | 0.003756 
  34 | BASH   | 1/31/2018  | 0.005454 
  35 | BASH   | 2/28/2018  | 0.006186 
  36 | BASH   | 3/31/2018  | 0.004155 
  37 | BASH   | 4/30/2018  | 0.005078 
  38 | BASH   | 5/31/2018  | 0.003696 
  39 | BASH   | 6/30/2018  | 0.003442 

我想为每个符号计算6个月的尾随方差,并将其作为新列添加到数据框中。应基于close列中的值计算方差。

例如,对于APX,有14个观测值,因此应基于值4.04、5.4、4.15、9.95、10.3和5.58计算第一个方差。

下一个方差应根据5.4、4.15、9.95、10.3、5.58和8.47等计算。

我假设我需要使用df.var函数来计算方差,但是如何告诉我每个符号在过去6个月的基础上进行计算?

1 个答案:

答案 0 :(得分:1)

您可以将groupbyrolling(6)var()一起使用,以获取每个组分离的数据中前6个观测值的滚动方差。将min_periods设置为6将强制该函数至少使用6个值进行计算,而无需对此设置较低的观察值将用于前5个结果。

df['trailing_var'] = df.groupby('symbol')['close'].rolling(6, min_periods=6).var().reset_index(drop=True)

结果:

  symbol          date       close    trailing_var
0    APX     5/31/2017    4.040000             NaN
1    APX     6/30/2017    5.400000             NaN
2    APX     7/31/2017    4.150000             NaN
3    APX     8/31/2017    9.950000             NaN
4    APX     9/30/2017    10.30000             NaN
5    APX    10/31/2017    5.580000    7.988720e+00
6    APX    11/30/2017    8.470000    6.776377e+00
7    APX    12/31/2017    15.66000    1.648918e+01
8    APX     1/31/2018    10.55000    1.085291e+01
9    APX     2/28/2018    9.800000    1.086476e+01
10   APX     3/31/2018    7.430000    1.196206e+01
11   APX     4/30/2018    8.930000    8.470240e+00
12   APX     5/31/2018    7.610000    9.167987e+00
13   APX     6/30/2018    7.790000    1.662630e+00
14   AURA    1/31/2018    0.221382             NaN
15   AURA    2/28/2018    0.222236             NaN
16   AURA    3/31/2018    0.075488             NaN
17   AURA    4/30/2018    0.180699             NaN
18   AURA    5/31/2018    0.220009             NaN
19   AURA    6/30/2018    0.199029    3.226191e-03
20   BASH   11/30/2016    0.000447             NaN
21   BASH   12/31/2016    0.000376             NaN
22   BASH    1/31/2017    0.000452             NaN
23   BASH    2/28/2017    0.000414             NaN
24   BASH    3/31/2017    0.000450             NaN
25   BASH    4/30/2017    0.000754    1.859857e-08
26   BASH    5/31/2017    0.009115    1.241904e-05
27   BASH    6/30/2017    0.034190    1.820075e-04
28   BASH    7/31/2017    0.014037    1.741278e-04
29   BASH    8/31/2017    0.009117    1.539841e-04
30   BASH    9/30/2017    0.002333    1.464200e-04
31   BASH   10/31/2017    0.002580    1.390604e-04
32   BASH   11/30/2017    0.003415    1.508145e-04
33   BASH   12/31/2017    0.003756    2.221467e-05
34   BASH    1/31/2018    0.005454    6.464003e-06
35   BASH    2/28/2018    0.006186    2.415413e-06
36   BASH    3/31/2018    0.004155    1.787309e-06
37   BASH    4/30/2018    0.005078    1.150985e-06
38   BASH    5/31/2018    0.003696    1.022634e-06
39   BASH    6/30/2018    0.003442    1.160249e-06