Question

我有一个包含数千个值的大表，我想使用binom.test计算p值。举个例子：

test <- data.frame("a" = c(4,8,8,4), "b" = c(2,3,8,0))

添加名为"pval"的第三列，我使用：

test$pval <- apply(test, 1, function(x)  binom.test(x[2],x[1],p=0.05)$p.value)

这适用于上面的小型测试样本，但是当我尝试将其用于我的实际数据集时，速度太慢了。有什么建议吗？

Answer 1

如果您只是使用p值，并且始终使用双面测试，那么只需从现有的binom.test函数中提取该部分代码。

simple.binom.test <- function(x, n)
{
  p <- 0.5
  relErr <- 1 + 1e-07
  d <- dbinom(x, n, p)
  m <- n * p
  if (x == m) 1 else if (x < m) {
    i <- seq.int(from = ceiling(m), to = n)
    y <- sum(dbinom(i, n, p) <= d * relErr)
    pbinom(x, n, p) + pbinom(n - y, n, p, lower.tail = FALSE)
  } else {
    i <- seq.int(from = 0, to = floor(m))
    y <- sum(dbinom(i, n, p) <= d * relErr)
    pbinom(y - 1, n, p) + pbinom(x - 1, n, p, lower.tail = FALSE)
  }
}

现在test that它提供与以前相同的值：

library(testthat)
test_that(
  "simple.binom.test works",
  {
    #some test data
    xn_pairs <- subset(
      expand.grid(x = 1:50, n = 1:50),
      n >= x
    )

    #test that simple.binom.test and binom.test give the same answer for each row.
    with(
      xn_pairs,
      invisible(
        mapply(
          function(x, n)
          {
            expect_equal(
              simple.binom.test(x, n),
              binom.test(x, n)$p.value
            )
          },
          x,
          n
        )
      )
    )
  }
)

现在看看它有多快：

xn_pairs <- subset(
    expand.grid(x = 1:50, n = 1:50),
    n >= x
  )    
system.time(
  with(
    xn_pairs,
    mapply(
      function(x, n)
      {
        binom.test(x, n)$p.value
      },
      x,
      n
    )
  )
)
##    user  system elapsed 
##    0.52    0.00    0.52
system.time(
  with(
    xn_pairs,
    mapply(
      function(x, n)
      {
        simple.binom.test(x, n)
      },
      x,
      n
    )
  )
)
##    user  system elapsed
##    0.09    0.00    0.09

加速五倍。

使用p值添加列 - 速度有效

1 个答案: