Question

如果这不是发布此问题的正确位置，我很抱歉，这与R中统计数据的数值稳定性有关。

我正在尝试计算非常高的df2值的F值，但看起来它在数值上不稳定：

nrange <- 350000:450000
f <- qf(1e-8, 8, nrange, lower.tail=FALSE)
plot(f ~ nrange)

如下所示： graph of F-values

基本上在df2=400000左右，它不再准确。问题是 - 有人知道如何解决这个问题吗？例如，F分布可以近似为两个卡方（例如 http://en.wikipedia.org/wiki/F-distribution#Related_distributions_and_properties），以及{{1它说明了对大型d2使用qf的问题。实际上qchisq对这些值看起来确实准确，但对我来说如何实现这一点并不明显。例如

qchisq

和

qf(0.05, 8, 100, lower.tail=FALSE)

不要给出相同的结果。

所以，问题是我如何获得大df2的准确F值？任何帮助将不胜感激。谢谢！

Answer 1

关于开源项目的一个有用的事情是它们是开源的

fortune(250)

As Obi-Wan Kenobi may have said in Star Wars: "Use the source, Luke!"
   -- Barry Rowlingson (answering a question on the documentation of some implementation details)
      R-devel (January 2010)

如果查看qf

的源代码

https://svn.r-project.org/R/trunk/src/nmath/qf.c

特别是这个位

/* fudge the extreme DF cases -- qbeta doesn't do this well.
   But we still need to fudge the infinite ones.
 */

if (df1 <= df2 && df2 > 4e5) {
if(!R_FINITE(df1)) /* df1 == df2 == Inf : */
    return 1.;
/* else */
return qchisq(p, df1, lower_tail, log_p) / df1;
}

你会发现他们正在捏造超过4e5的价值。（通过假设与df2 == Inf）相同的结果完全忽略df2

计算大df2的F值

1 个答案: