Question

我有一个像这样的2x2表，我想对它进行Fisher精确测试，以测试两组之间的重叠是否显着。

正如您所看到的，我在2.2e9处有一个非常大的数字，它超过了32位R可以处理的最大数字的限制。

    yes no
yes 127437282   364949163
no  188213539   2200433302

我用bit64包和as.integer64（）来解决这个问题。然后我进行了费希尔的精确测试：

    fisher<-function(n1,n2,n3,n4,fname){
    library(bit64)
    n1n<-as.integer(n1)
    n2n<-as.integer(n2)
    n3n<-as.integer(n3)
    n4n<-as.integer64(n4)
    testor=rbind(c(n1n,n2n),c(n3n,n4n))
    x<-fisher.test(testor)
    print("sample name")
    print(fname)
    print("data is")
    print(testor)
    print("fisher's exact test result is")
    x
}
fisher(f1,f2,f3,f4,f5)

结果如下：

          [,1]          [,2]
[1,] 127437282  3.649492e+08
[2,] 188213539 1.086944e-314
[1] "fisher's exact test result is"

        Fisher's Exact Test for Count Data

data:  
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.0000000000 0.0001190186
sample estimates:
odds ratio
         0

不知何故，大号2200433302变为1.086944e-314。为什么会这样？

此外，计算完成时间超过5分钟，这太慢了。我不确定用大数字进行这样的计算是否有意义？有没有办法转换输入数据并仍然保持其有效性？

谢谢！

Answer 1

我不认为使用pkg：bit64可以提供可靠的计算基础。甚至基本的订单测试都失败了：

这是费希尔测试中的测试：

>  any( c(1,1,1, n4n) > .Machine$integer.max)
[1] FALSE

我认为integer64值被隐式强制转换为numeric值，然后在这种情况下被不适当地解释为非常小的数值'numeric'或'double'是53位的“a”部分大多数剩余的11位为10的幂。注意数字“1”从整数64强制转换为数字c()会发生什么。

> b <- as.integer64(1)
> c(1,b)
[1]  1.000000e+00 4.940656e-324

这旨在表明将“integer64”对象放在简单结构中会产生问题：

> c(1,1,1, n4n)[4] > .Machine$integer.max
[1] FALSE
>  n4n > .Machine$integer.max
[1] TRUE

问题在于，为了使'{1}}和c.integer64正确处理'integer64'，需要将>.integer64分类对象作为第一项。（这些是在S3方法之后，因此它们仅从第一个参数中进行类调度。）

integer64

请注意，使用2,2位置交换1,1位置应该会得到相同的结果。

> c(n4n, 1,1,1, n4n)[5] > .Machine$integer.max
[1] TRUE

将值更改为all被强制为整数64并不能解决问题：

> fisher( 2200433302,   364949163, 188213539, 127437282, "no_op")
Error in fisher.test(testor) : 
  all entries of 'x' must be nonnegative and finite
In addition: Warning message:
In fisher(2200433302, 364949163, 188213539, 127437282, "no_op") :
  NAs introduced by coercion

R：是渔夫的精确测试，大数仍然准确吗？

1 个答案: