如何使用dplyr分组执行统计测试,然后用扫帚进行小试

时间:2018-08-14 09:16:45

标签: r tidyverse broom

我有以下数据框:

setTimeout

我要为每个功能执行的操作:library(tidyverse) dat <- structure(list(charge.Group3 = c(0.167, 0.167, 0.1, 0.067, 0.033, 0.033, 0.067, 0.133, 0.2, 0.067, 0.133, 0.114, 0.167, 0.033, 0.1, 0.033, 0.133, 0.267, 0.133, 0.233, 0.1, 0.167, 0.067, 0.133, 0.1, 0.133, 0.1, 0.133, 0.1, 0.067, 0.167, 0), hydrophobicity.Group3 = c(0.267, 0.467, 0.067, 0.167, 0.267, 0.1, 0.367, 0.233, 0.367, 0.233, 0.133, 0.205, 0.333, 0.267, 0.267, 0.067, 0.133, 0.3, 0.233, 0.267, 0.5, 0.333, 0.2, 0.5, 0.5, 0.4, 0.033, 0.3, 0.233, 0.5, 0.233, 0.033), class = c("Negative", "Negative", "Positive", "Positive", "Positive", "Positive", "Positive", "Negative", "Positive", "Positive", "Positive", "Positive", "Positive", "Positive", "Negative", "Positive", "Negative", "Negative", "Negative", "Negative", "Negative", "Negative", "Negative", "Negative", "Negative", "Negative", "Positive", "Positive", "Positive", "Negative", "Positive", "Negative")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -32L)) dat #> # A tibble: 32 x 3 #> charge.Group3 hydrophobicity.Group3 class #> <dbl> <dbl> <chr> #> 1 0.167 0.267 Negative #> 2 0.167 0.467 Negative #> 3 0.1 0.067 Positive #> 4 0.067 0.167 Positive #> 5 0.033 0.267 Positive #> 6 0.033 0.1 Positive #> 7 0.067 0.367 Positive #> 8 0.133 0.233 Negative #> 9 0.2 0.367 Positive #> 10 0.067 0.233 Positive #> # ... with 22 more rows charge.Group3,在消极和积极之间做hydrophobicity.Group3。最后获得p值作为数据帧或小标题:

wilcox.test

请注意,实际上有2个以上的功能。 我该如何实现?

2 个答案:

答案 0 :(得分:2)

这是使用dplyr::summarize_attidyr::gather的一种方法:

library(tidyverse)
dat %>%
  summarize_at(c("charge.Group3","hydrophobicity.Group3"),
               ~wilcox.test(.x ~ .y)$p.value, .$class) %>%
  gather(features, pvalue)

# # A tibble: 2 x 2
#                features pvalue
#                   <chr>  <dbl>
# 1         charge.Group3  0.109
# 2 hydrophobicity.Group3  0.039

总结除class之外的所有变量:

dat %>%
  summarize_at(vars(-class),
               ~wilcox.test(.x ~ .y)$p.value,
               .$class) %>%
  gather(features,pvalue)

答案 1 :(得分:2)

如果只需要测试的p值,则实际上不需要使用broom

library(tidyverse)


dat %>% 
  gather(group, value, -class) %>%    # reshape data            
  nest(-group) %>%                    # for each group nest data
  mutate(pval = map_dbl(data, ~wilcox.test(value ~ class, data = .)$p.value)) %>%  # get p value for wilcoxon test
  select(-data)                       # remove data column


# # A tibble: 2 x 2
#   group                   pval
#   <chr>                  <dbl>
# 1 charge.Group3         0.109 
# 2 hydrophobicity.Group3 0.0390        

首先重塑将使您能够应用此过程,无论您拥有多少列,并假设class是唯一的额外变量。

或者您甚至可以避免map,因为@Moody_Mudskipper建议使用

dat %>% 
  gather(group, value, -class) %>% 
  group_by(group) %>% 
  summarize(results = wilcox.test(value ~ class)$p.value)

如果您真的想参与broom,那么可以

library(broom)

dat %>% 
   gather(group, value, -class) %>%  
   nest(-group) %>%                  
   mutate(results = map(data, ~tidy(wilcox.test(value ~ class, data = .)))) %>%
   select(-data) %>%
   unnest(results)

# # A tibble: 2 x 5
# group                 statistic p.value method                                            alternative
#   <chr>                     <dbl>   <dbl> <chr>                                             <chr>      
# 1 charge.Group3              170.  0.109  Wilcoxon rank sum test with continuity correction two.sided  
# 2 hydrophobicity.Group3      183   0.0390 Wilcoxon rank sum test with continuity correction two.sided 

返回更多列,但是如果需要,您可以保留p值。