在嵌套数据框中,筛选包含特定字符串的行

时间:2018-05-23 19:50:31

标签: r dataframe filter dplyr

我使用嵌套数据框来嵌套某些组,然后对$ data列中的因子和值运行t测试。但是,在某些情况下,我最终在$ data列中没有两个可用因素。因此,t测试不能运行,代码将产生整个数据帧的错误。在下面的示例中,组a-d将具有两种可用于比较的处理。但是,g roup e不会。如何指定t测试仅在两种处理都可用的行上运行?

set.seed(1)
df <- data.frame(id = paste0('ID-', 1:100),
                 group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
                 treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
                 value = runif(100))

df_analysis <- df %>% 
  nest(-group) %>% 
  #How to ask to only run t test on rows that have both treatments in them? As written, it will give an error.
  mutate(p = map_dbl(data, ~t.test(value ~ treatment, data=.)$p.value))

2 个答案:

答案 0 :(得分:2)

由于您已经使用了一些tidyverse包,您可以使用一些咕噜声功能来捕捉副作用。在这种情况下,您可以使用possibly,它会在发生错误时使用默认值。

使用您的代码:

library(dplyr)
library(purrr)
library(tidyr)

set.seed(1)
df <- data_frame(id = paste0('ID-', 1:100),
                 group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
                 treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
                 value = runif(100))

df_analysis  <- df %>% 
  nest(-group) %>% 
  mutate(p = map_dbl(data, possibly(~t.test(value ~ treatment, data=.)$p.value, NA_real_)))

# A tibble: 5 x 3
  group data                   p
  <chr> <list>             <dbl>
1 a     <tibble [20 x 3]>  0.610
2 b     <tibble [20 x 3]>  0.156
3 c     <tibble [20 x 3]>  0.840
4 d     <tibble [20 x 3]>  0.383
5 e     <tibble [20 x 3]> NA    

答案 1 :(得分:1)

t.test(...)包裹在ifelse()中,检查treatment中{1}}的唯一商品数量是否为==2

df %>% 
  nest(-group) %>% 
  mutate(p = map_dbl(data, ~ifelse(length(unique(.x$treatment)) == 2, t.test(value ~ treatment, data=.)$p.value, NA)))

# A tibble: 5 x 3
  # group data                        p
  # <fct> <list>                  <dbl>
# 1 a     <data.frame [20 x 3]>  0.790 
# 2 b     <data.frame [20 x 3]>  0.0300
# 3 c     <data.frame [20 x 3]>  0.712 
# 4 d     <data.frame [20 x 3]>  0.662 
# 5 e     <data.frame [20 x 3]> NA