Question

我有一个2x2列联表，我想计算内部对是否有显着差异。我制作了一个像以下名称raw_matrix

的矩阵

          CNS random
Not_H3K4  343  28825
H3K4      11   2014

创建此矩阵，因此：

raw_matrix = structure(c(343, 11, 28825, 2014), 
    .Dim = c(2L, 2L), .Dimnames = list(
    c("NotH3K", "H3K"), c("CNS", "Random")))

当我搜索时，像Barnard和Boschloo的精确测试这样的无条件精确测试是最有效的测试。我安装了＆＃39; Exact＆＃39;包并尝试使用此命令进行测试：

exact.test(raw_matrix)

在64GB内存和3.5 GH CPU计算机上花了半个多小时，最后它出现了以下错误：

    Error: cannot allocate vector of size 42.0 Gb
In addition: Warning messages:
1: In matrix(A[xTbls + 1, ] * B[yTbls + 1, ], ncol = length(int)) :
  Reached total allocation of 61417Mb: see help(memory.size)
2: In matrix(A[xTbls + 1, ] * B[yTbls + 1, ], ncol = length(int)) :
  Reached total allocation of 61417Mb: see help(memory.size)
3: In matrix(A[xTbls + 1, ] * B[yTbls + 1, ], ncol = length(int)) :
  Reached total allocation of 61417Mb: see help(memory.size)
4: In matrix(A[xTbls + 1, ] * B[yTbls + 1, ], ncol = length(int)) :
  Reached total allocation of 61417Mb: see help(memory.size)

然后我安装了＆＃39; Exact2x2＆＃39;包并使用此命令进行测试：

exact2x2(raw_matrix)

给了我以下结果：

    Two-sided Fisher's Exact Test (usual method using minimum likelihood)

data:  raw_matrix
p-value = 0.006433
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.2028 4.2424
sample estimates:
odds ratio 
  2.178631

但正如我在“精确包装”教程中所读到的那样，作为条件精确检验的Fisher精确检验并不是那么强大。最后我使用命令chisq.test（raw.matrix）进行了正常的卡方检验，得到的结果与费希尔检验的结果不同：

    Pearson's Chi-squared test with Yates' continuity correction

data:  test_1
X-squared = 6.2045, df = 1, p-value = 0.01274

我是一名遗传学家，而不是统计专家，我很感激，如果有人能告诉我这里测试的最佳策略是什么

Answer 1

已经很多了，但是我在这里结束了对这个话题的研究，所以我想我可以分享我发现的东西。

您显示的表看起来是无条件的（意味着您事先不知道行或列的总和），这很好，但是它要求进行无条件的测试。这是我们在建立列联表之前唯一要问的问题：从实验设计中知道行或列的总和吗？。

Fisher的测试完全是有条件的，在这种设置下可能会被反对（除了“ lady tasting tea”实验外，几乎所有设置都是如此）。

Pearsons的情况似乎很好（主要问题是关于单元格中的小数字，例如<5，所以应该没问题），即使它几乎从来都不是最佳选择，它仍然被广泛使用。

确切的无条件测试会更好（好奇会有多少改进），但是看起来数字足够大，会引起计算问题，所以皮尔森就是这样。

R中的条件或无条件精确检验

1 个答案: