Question

我正在尝试计算几个二项式比例置信区间。我的数据在数据帧中，尽管我可以从estimate返回的对象中成功提取prop.test，但是在数据帧上运行时，conf.int变量似乎为空。 / p>

library(dplyr)

cases <- c(50000, 1000, 10, 2343242)
population <- c(100000000, 500000000, 100000, 200000000)

df <- as.data.frame(cbind(cases, population))
df %>% mutate(rate = prop.test(cases, pop, conf.level=0.95)$estimate)

这将适当地返回

    cases population       rate
1   50000      1e+08 0.00050000
2    1000      5e+08 0.00000200
3      10      1e+05 0.00010000
4 2343242      2e+08 0.01171621

但是，当我跑步时

df %>% mutate(confint.lower= prop.test(cases, pop, conf.level=0.95)$conf.int[1])

我很难过

Error in mutate_impl(.data, dots) : 
  Column `confint.lower` is of unsupported type NULL

有什么想法吗？我知道计算二项式比例置信区间的其他方法，但是我真的很想学习如何很好地使用dplyr。

谢谢！

Answer 1

您可以使用dplyr::rowwise()对行进行分组：

df %>%
    rowwise() %>%
    mutate(lower_ci = prop.test(cases, pop, conf.level=0.95)$conf.int[1])

默认情况下，dplyr采用列名并将其视为向量。因此，矢量化函数（例如上面提到的@Jake Fisher）无需添加rowwise()就可以正常工作。

这是我要立即捕获所有置信区间成分的操作：

df %>%
    rowwise %>%
    mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
    tidyr::unnest(tst)

Answer 2

从dplyr的0.8.3版本开始，rowwise()函数的生命周期状态为“正在查询”。因此，我宁愿建议使用purrr::map2()来实现目标：

df %>%
  mutate(rate = map2(cases, pop, ~ prop.test(.x, .y, conf.level=0.95) %>%
                                     broom::tidy())) %>%
  unnest(rate)

使用dplyr

2 个答案: