分组统计检验方法

时间:2019-09-26 18:29:28

标签: r dplyr statistics

我正在尝试对长格式数据进行Wilcoxon测试。我想使用dplyr::group_by()来指定我要进行测试的子集。

最终结果将是一个新列,并将Wilcoxon检验的p值附加到原始数据框中。我所见过的所有技术都需要汇总数据帧。我不想总结数据框架。

请参见重新格式化iris数据集以模仿我的数据,最后是我尝试执行任务的示例。

我越来越近了,但是我想保留Wilcoxon测试之前的所有原始数据。

# Reformatting Iris to mimic my data.
long_format <- iris %>% 
  gather(key = "attribute", value = "measurement", -Species) %>%
  mutate(descriptor = 
           case_when(
    str_extract(attribute, pattern = "\\.(.*)") == ".Width" ~ "Width",
    str_extract(attribute, pattern = "\\.(.*)") == ".Length" ~ "Length")) %>%
  mutate(Feature = 
           case_when(
    str_extract(attribute, pattern = "^(.*?)\\.") == "Sepal." ~ "Sepal",
    str_extract(attribute, pattern = "^(.*?)\\.") == "Petal." ~ "Petal"))

# Removing no longer necessary column.
cleaned_up <- long_format %>% select(-attribute)

# Attempt using do(), but I lose important info like "measurement"
cleaned_up %>%
  group_by(Species, Feature) %>%
  do(w = wilcox.test(measurement~descriptor, data=., paired=FALSE)) %>% 
  mutate(Wilcox = w$p.value)

# This is an attempt with the dplyr experimental group_map function. If only I could just make this a new column appended to the original df in one step.

cleaned_up %>%
  group_by(Species, Feature) %>%
  group_map(~ wilcox.test(measurement~descriptor, data=., paired=FALSE)$p.value)

感谢您的帮助。

2 个答案:

答案 0 :(得分:3)

模型对象可以包装在list

library(tidyverse)
cleaned_up %>%
   group_by(Species, Feature) %>%
   nest %>% 
   mutate(model = map(data, ~ 
          .x %>%
           transmute(w = list(wilcox.test(measurement~descriptor, 
               data=., paired=FALSE)))))

或者另一个选择是将group_split放入list,然后maplist,元素在应用模型后会创建“ pval”列

cleaned_up %>% 
    group_split(Species, Feature) %>%
    map_dfr(~ .x %>%
                 mutate(pval = wilcox.test(measurement~descriptor, 
               data=., paired=FALSE)$p.value))

答案 1 :(得分:2)

另一个选择是完全避免使用data参数。当要测试的变量不在调用范围内时,wilcox.test函数仅需要一个数据参数,但是在mutate中调用的函数具有该范围内数据帧的所有列。

cleaned_up %>%
  group_by(Species, Feature) %>%
  mutate(pval = wilcox.test(measurement~descriptor, paired=FALSE)$p.value)

与akrun的输出相同(由于他在上述评论中的更正)

akrun <- 
  cleaned_up %>% 
    group_split(Species, Feature) %>%
    map_dfr(~ .x %>%
                 mutate(pval = wilcox.test(measurement~descriptor, 
               data=., paired=FALSE)$p.value))

me <- 
cleaned_up %>%
  group_by(Species, Feature) %>%
  mutate(pval = wilcox.test(measurement~descriptor, paired=FALSE)$p.value)

all.equal(akrun, me)
# [1] TRUE