我使用嵌套数据框来嵌套某些组,然后对$ data列中的因子和值运行t测试。但是,在某些情况下,我最终在$ data列中没有两个可用因素。因此,t测试不能运行,代码将产生整个数据帧的错误。在下面的示例中,组a-d将具有两种可用于比较的处理。但是,g roup e不会。如何指定t测试仅在两种处理都可用的行上运行?
set.seed(1)
df <- data.frame(id = paste0('ID-', 1:100),
group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
value = runif(100))
df_analysis <- df %>%
nest(-group) %>%
#How to ask to only run t test on rows that have both treatments in them? As written, it will give an error.
mutate(p = map_dbl(data, ~t.test(value ~ treatment, data=.)$p.value))
答案 0 :(得分:2)
由于您已经使用了一些tidyverse包,您可以使用一些咕噜声功能来捕捉副作用。在这种情况下,您可以使用possibly
,它会在发生错误时使用默认值。
使用您的代码:
library(dplyr)
library(purrr)
library(tidyr)
set.seed(1)
df <- data_frame(id = paste0('ID-', 1:100),
group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
value = runif(100))
df_analysis <- df %>%
nest(-group) %>%
mutate(p = map_dbl(data, possibly(~t.test(value ~ treatment, data=.)$p.value, NA_real_)))
# A tibble: 5 x 3
group data p
<chr> <list> <dbl>
1 a <tibble [20 x 3]> 0.610
2 b <tibble [20 x 3]> 0.156
3 c <tibble [20 x 3]> 0.840
4 d <tibble [20 x 3]> 0.383
5 e <tibble [20 x 3]> NA
答案 1 :(得分:1)
将t.test(...)
包裹在ifelse()
中,检查treatment
中{1}}的唯一商品数量是否为==2
df %>%
nest(-group) %>%
mutate(p = map_dbl(data, ~ifelse(length(unique(.x$treatment)) == 2, t.test(value ~ treatment, data=.)$p.value, NA)))
# A tibble: 5 x 3
# group data p
# <fct> <list> <dbl>
# 1 a <data.frame [20 x 3]> 0.790
# 2 b <data.frame [20 x 3]> 0.0300
# 3 c <data.frame [20 x 3]> 0.712
# 4 d <data.frame [20 x 3]> 0.662
# 5 e <data.frame [20 x 3]> NA