为什么我的代码在计算百分比时使用熊猫返回“ nan”?

时间:2019-01-29 06:42:39

标签: python-3.x pandas

硬件问题:考虑这种投资策略:每当价格超过50天移动平均线时就买入,然后在3个交易日后卖出。我们平均可以赚多少利润(以百分比为单位)?在第x个交易日,我们说,如果(1)价格低于交易日x-1的移动平均线,并且(2)价格高于交易日的移动平均线,则价格“高于” 50天移动平均线第x天。

rol=stock.rolling(50).mean()

profitMade=((stock.shift(-3)-stock)/stock)

stock>rol

profitMade[(stock<stock.shift(-1))&(stock>rol)]

profitMade.pct_change()

profitMade[profitMade.pct_change()].mean()

最后一行返回“ nan”是期望的值

样本数据:

Date
2002-05-23      1.196429
2002-05-24      1.210000
2002-05-28      1.157143
2002-05-29      1.103571
2002-05-30      1.071429
2002-05-31      1.076429
2002-06-03      1.128571
2002-06-04      1.117857
2002-06-05      1.147143
2002-06-06      1.182143
2002-06-07      1.118571
2002-06-10      1.156429
2002-06-11      1.153571
2002-06-12      1.092857
2002-06-13      1.082857
2002-06-14      0.986429
2002-06-17      0.922143
2002-06-18      0.910714
2002-06-19      0.951429
2002-06-20      0.957143
2002-06-21      0.979286
2002-06-24      0.978571
2002-06-25      0.964286
2002-06-26      0.988571
2002-06-27      0.943571
2002-06-28      0.999286
2002-07-01      1.027857
2002-07-02      1.172857
2002-07-03      1.214286
2002-07-05      1.276429

1 个答案:

答案 0 :(得分:0)

看看rol的值,它全都是NaN-

rol = stock.rolling(50).mean()
rol
Out:                                        
                               value                   
Date                                
2002-05-23                       NaN
2002-05-24                       NaN
2002-05-28                       NaN
2002-05-29                       NaN
2002-05-30                       NaN
2002-05-31                       NaN
2002-06-03                       NaN
2002-06-04                       NaN
2002-06-05                       NaN
2002-06-06                       NaN
2002-06-07                       NaN
2002-06-10                       NaN
2002-06-11                       NaN
2002-06-12                       NaN
2002-06-13                       NaN
2002-06-14                       NaN
2002-06-17                       NaN
2002-06-18                       NaN
2002-06-19                       NaN
2002-06-20                       NaN
2002-06-21                       NaN
2002-06-24                       NaN
2002-06-25                       NaN
2002-06-26                       NaN
2002-06-27                       NaN
2002-06-28                       NaN
2002-07-01                       NaN
2002-07-02                       NaN
2002-07-03                       NaN
2002-07-05                       NaN

滚动时,它将使用大小为50的窗口来捕获值。默认情况下,边缘窗口捕获的值少于要求的值,并用NaN填充/在您的情况下,窗口的大小远大于DataFrame的大小-因此,所有值均设置为NaN

要证明这一概念,请查看较小的窗口尺寸:

rol = stock.rolling(20).mean()
print(rol)
Out:
            value                   
Date                                
2002-05-23                       NaN
2002-05-24                       NaN
2002-05-28                       NaN
2002-05-29                       NaN
2002-05-30                       NaN
2002-05-31                       NaN
2002-06-03                       NaN
2002-06-04                       NaN
2002-06-05                       NaN
2002-06-06                       NaN
2002-06-07                       NaN
2002-06-10                       NaN
2002-06-11                       NaN
2002-06-12                       NaN
2002-06-13                       NaN
2002-06-14                       NaN
2002-06-17                       NaN
2002-06-18                       NaN
2002-06-19                       NaN
2002-06-20                  1.086143
2002-06-21                  1.075286
2002-06-24                  1.063714
2002-06-25                  1.054071
2002-06-26                  1.048321
2002-06-27                  1.041929
2002-06-28                  1.038071
2002-07-01                  1.033036
2002-07-02                  1.035786
2002-07-03                  1.039143
2002-07-05                  1.043857

-第一个非NaN值是二十分之一。

为避免此行为,可以为min_period的{​​{1}}参数提供一个值:

rolling

-因此,如果元素少于窗口大小,则滚动将按提供的数量进行。

关于rol = stock.rolling(50, min_periods=1).mean() print(rol) Out: value Date 2002-05-23 1.196429 2002-05-24 1.203215 2002-05-28 1.187857 2002-05-29 1.166786 2002-05-30 1.147714 2002-05-31 1.135834 2002-06-03 1.134796 2002-06-04 1.132679 2002-06-05 1.134286 2002-06-06 1.139072 2002-06-07 1.137208 2002-06-10 1.138810 2002-06-11 1.139945 2002-06-12 1.136582 2002-06-13 1.133000 2002-06-14 1.123839 2002-06-17 1.111975 2002-06-18 1.100794 2002-06-19 1.092932 2002-06-20 1.086143 2002-06-21 1.081054 2002-06-24 1.076396 2002-06-25 1.071522 2002-06-26 1.068065 2002-06-27 1.063086 2002-06-28 1.060632 2002-07-01 1.059418 2002-07-02 1.063469 2002-07-03 1.068670 2002-07-05 1.075595 的文档:

  

min_periods:int,默认为无
      窗口中具有值的最小观察数
      (否则结果为NA)。对于由偏移量指定的窗口,
      默认为1。

在下面的行中,您“松散”了最后三个值,将它们设置为NaN:

min_periods

-所以,我想,您应该删除它(可能是我错了,因为我对这个特定任务不太熟悉)。然后重新索引profitMade = ((stock.shift(-3) - stock)/stock) profitMade Out: ... 2002-07-01 1.276429 2002-07-02 NaN 2002-07-03 NaN 2002-07-05 NaN stock,因为进一步操作需要相同的大小。

rol

好,有三个大小相等的表。我更改了一行,该行返回了一个充满NaN的表

profitMade  = profitMade.dropna()
stock = stock.loc[profitMade.index]
rol = rol.loc[profitMade.index]

profitMade[(stock<stock.shift(-1))&(stock>rol)]
Out:
               value
Date                
2002-05-23       NaN
2002-05-24       NaN
2002-05-28       NaN
2002-05-29       NaN
2002-05-30       NaN
2002-05-31       NaN
2002-06-03       NaN
2002-06-04       NaN
2002-06-05  0.008095
2002-06-06       NaN
2002-06-07       NaN
2002-06-10       NaN

-处理特定列并删除NaN。

此外,我不知道您在这里做什么:

profitMade[(stock['value'] < stock['value'].shift(-1)) & (stock['value'] > rol['value'])]
Out:
    value
Date    
2002-06-05  0.008095

-profitMade[profitMade.pct_change()].mean() 返回一个表,其中包含profitMade.pct_change()个值的表(虚拟百分比),但是float希望使用布尔对象-您应澄清并编辑问题。

完整代码:

profitMade[...]