dplyr中是否有办法比较各组?这里有一个具体的例子:我想对以下组合应用t检验:a vs b,a vs c和b vs c
set.seed(1)
tibble(value = c(rnorm(1000, 1, 1), rnorm(1000, 5, 1), rnorm(1000, 10,1)),
group=c(rep("a", 1000), rep("b", 1000), rep("c", 1000))) %>%
nest(value)
# A tibble: 3 x 2
group data
<chr> <list>
1 a <tibble [1,000 × 1]>
2 b <tibble [1,000 × 1]>
3 c <tibble [1,000 × 1]>
如果dplyr没有提供解决方案,我也会对其他方法感到高兴...也许是data.table?
答案 0 :(得分:4)
这里有一个基础R / tidyverse方法(有点手动,但是这个任务还不行):
combn(df$group, 2, FUN = function(g)
t.test(filter(df, group == g[1]) %>% unnest %$% value ,
filter(df, group == g[2]) %>% unnest %$% value ),
simplify = FALSE)
# [[1]]
#
# Welch Two Sample t-test
#
# data: filter(df, group == g[1]) %>% unnest %$% value and filter(df, group == g[2]) %>% unnest %$% value
# t = -86.114, df = 1998, p-value < 2.2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -4.086376 -3.904396
# sample estimates:
# mean of x mean of y
# 0.9883519 4.9837381
#
#
# [[2]]
#
# Welch Two Sample t-test
#
# data: filter(df, group == g[1]) %>% unnest %$% value and filter(df, group == g[2]) %>% unnest %$% value
# t = -195.4, df = 1998, p-value < 2.2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -9.117558 -8.936356
# sample estimates:
# mean of x mean of y
# 0.9883519 10.0153090
#
#
# [[3]]
#
# Welch Two Sample t-test
#
# data: filter(df, group == g[1]) %>% unnest %$% value and filter(df, group == g[2]) %>% unnest %$% value
# t = -108.65, df = 1997.9, p-value < 2.2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -5.122395 -4.940747
# sample estimates:
# mean of x mean of y
# 4.983738 10.015309