Question

我一直在使用scipy.special.erfcinv来计算p值的Z分数。但是，当p值变得非常小时，erfcinv会出乎意料地变大。有什么想法吗？

示例：

In [1]: import numpy as np
In [2]: from scipy.special import erfcinv
In [3]: erfcinv(2e-16) * np.sqrt(2)
Out[3]: 8.2095361516013874
In [4]: erfcinv(1e-16) * np.sqrt(2)
Out[4]: 1.7976931348623155e+308

我正在使用scipy 0.10.1运行python 2.6.6。

Answer 1

简短回答：越接近浮点算术精度的极限，就会发生奇怪的事情：（http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html，http://www.seas.ucla.edu/~vandenbe/103/lectures/flpt.pdf）

再长一点。首先让我们看看erfcinv函数：

def erfcinv(y):
    return ndtri((2-y)/2.0)/sqrt(2)

如果我们采用y = 2e-16：

In [96]: (2 - 2e-16) / 2
Out[96]: 0.9999999999999999

当我们采取y = 1e-16：

In [97]: (2 - 1e-16) / 2
Out[97]: 1.0

现在我们来看看ndtri：

x=ndtri(y) returns the argument x for which the area udnder the
Gaussian probability density function (integrated from minus infinity
to x) is equal to y.

现在一切都应该清楚，我是对的吗？你可以怀疑：

In [99]: ndtri(1)
Out[99]: inf

你的结果可能有点不同 - 就我而言：

In [101]: erfcinv(1e-16) * np.sqrt(2)
Out[101]: inf

Answer 2

如果你想计算分位数或p值，你可以使用stats.distributions，它在大多数情况下找到最合适的特殊函数，在这种情况下我总是使用stats.norm.isf作为上尾pvalue（也因为我不想记住erf或erfcinv是什么。）

>>> for y in 10.**(-np.arange(5, 30, 2)):
    print y, special.erfcinv(y) * np.sqrt(2), stats.norm.isf(y/2.), -special.ndtri(y/2)


1e-05 4.41717341347 4.41717341347 4.41717341347
1e-07 5.32672388628 5.32672388638 5.32672388638
1e-09 6.10941019166 6.10941020487 6.10941020487
1e-11 6.80650247883 6.80650249074 6.80650249074
1e-13 7.4410077655 7.44090215064 7.44090215064
1e-15 8.01401594878 8.02685888253 8.02685888253
1e-17 inf 8.57394407672 8.57394407672
1e-19 inf 9.08895010083 9.08895010083
1e-21 inf 9.57690145543 9.57690145543
1e-23 inf 10.0416376122 10.0416376122
1e-25 inf 10.4861701796 10.4861701796
1e-27 inf 10.9129127092 10.9129127092
1e-29 inf 11.3238345582 11.3238345582

零点323指出的浮点问题在很多其他情况下仍然会出现。

Scipy erfcinv意外地在1e-16附近爆炸

2 个答案: