Python是否在为大型数组实现np.std时存在错误?

时间:2016-02-23 09:56:57

标签: python numpy std variance

我试图通过np.std(数组,ddof = 0)来计算差异。如果我碰巧有一个长delta数组,即数组中的所有值都相同,就会出现问题。它不是返回std = 0,而是给出一些小的值,这又会导致进一步的估计误差。平均值正确返回... 例如:

np.std([0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],ddof = 0)

给出1.80411241502e-16

但是

np.std([0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],ddof = 0)

给出std = 0

有没有办法克服这个问题,除非现在检查每次迭代的数据唯一性而根本不计算std?

由于

P.S。在标记为Is floating point math broken?的重复之后,复制粘贴@kxr的回复,说明为什么这是一个不同的问题:

“当前重复的标记是错误的。它不仅仅是关于简单的浮点数比较,而是关于通过在长数组上使用np.std来实现近零结果的小错误的内部聚合 - 因为提问者指出了额外的。比较例如{ {1}}。所以他可以通过以下方式解决:>>> np.std([0.1, 0.1, 0.1, 0.1, 0.1, 0.1]*200000) -> 2.0808632594793153e-12

问题肯定从浮动表示开始,但它并不止于此。 @kxr - 我很感谢评论和示例

1 个答案:

答案 0 :(得分:4)

欢迎来到实用数值算法的世界!在现实生活中,如果您有两个浮点数xy,则检查x == y毫无意义。因此,对于标准偏差是否为0的问题没有意义,它是否接近它。我们使用np.isclose

进行检查
import numpy as np

>>> np.isclose(1.80411241502e-16, 0)
True

有效地,这是你能想到的最好的。在现实生活中,您甚至无法检查所有物品是否与您建议的相同。他们是浮点数吗?它们是由其他一些过程产生的吗?它们也会有小错误。