我想运行t检验并提取分组因子的所有组合的p值。数据框只有2列。虚拟数据示例:
set.seed(123)
df <- data.frame(
Group = c(rep("A", 5), rep("B", 4), rep("C", 6)),
Val = c(sample(101:200, 5, replace = T), sample(1:100, 4, replace = T), sample(1:100, 6, replace = T))
)
期望输出
data.frame(
A = c(1, 0.00191, 0.00017),
B = c(0.00191,1,0.88500),
C = c(0.00017,0.88500,1)
)
A B C
1 1.00000 0.00191 0.00016
2 0.00191 1.00000 0.88500
3 0.00016 0.88500 1.00000
为方便起见,这里是t.test
的包装函数,它提取pvalue
tWrap <- function(x, y) t.test(x, y)$p.value
谢谢,我在网上搜索了使用group_by
和purrr::map
的解决方案,但无法破解它。
答案 0 :(得分:1)
数据:
set.seed(123)
df <- data.frame(
Group = c(rep("A", 5), rep("B", 4), rep("C", 6)),
Val = c(sample(101:200, 5, replace = T), sample(1:100, 4, replace = T), sample(1:100, 6, replace = T))
)
代码:
library(purrr)
# get group combinations
params_list <- combn(levels(df$Group), 2, FUN = list)
# perform t test for each combination
model_t <- map(.x = params_list,
.f = ~ t.test(formula = Val ~ Group,
data = subset(df, Group %in% .x)))
# extract p values
t_pvals <- map_dbl(.x = model_t, .f = "p.value")
names(t_pvals) <- map_chr(.x = params_list, .f = ~ paste0(.x, collapse = ""))
t_pvals
# AB AC BC
# 0.0019183244 0.0001655259 0.8850039246
答案 1 :(得分:1)
基础R解决方案
我修改了tWrap
tWrap <- function(x) t.test(x$Var1, x$Var2)$p.value
L <- split(df$Val, df$Group)
pvals <- apply(expand.grid(L, L), 1, tWrap)
pvals_mat <- matrix(pvals, ncol=3)
# [,1] [,2] [,3]
# [1,] 1.0000000000 0.001918324 0.0001655259
# [2,] 0.0019183244 1.000000000 0.8850039246
# [3,] 0.0001655259 0.885003925 1.0000000000