我有一个包含多个列和两个不同组的数据框-参见下文。
set.seed(123)
d <- data.frame(
q1 = rnorm(20),
q2 = rnorm(20),
q3 = rnorm(20),
group = sample(c("A", "B"), size = 20, replace = TRUE))
我使用lapply
来计算两组之间每一列的ttest,如下所示:
lapply(d[,-4], function(i) t.test(i ~ d$group))
lapply
为每一列返回列出几个统计信息数据的结果(我刚刚报告了q1列)
$q1
Welch Two Sample t-test
data: i by d$group
t = -0.76262, df = 17.323, p-value = 0.4559
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2294678 0.5759458
sample estimates:
mean in group A mean in group B
-0.05443279 0.27232820
我想将每列(q1,q2,q3 ...)的主要统计信息(t,df,pvalue)概括为单个表
答案 0 :(得分:1)
您可以再次使用lapply()
来提取每个参数和bind_rows()
:
library(dplyr)
lapply(l, function(x) {
data.frame(t = x$statistic,
df = x$parameter,
pv = x$p.value) # returns a dataframe for each element in l
}) %>% bind_rows()
# t df pv
# 1 -1.031983 13.533116 0.32017136
# 2 -2.458574 9.771018 0.03427922
# 3 1.421821 11.416813 0.18181697
您可以一次完成此操作:
lapply(d[,-4], function(i) {
res <- t.test(i ~ d$group)
data.frame(t = res$statistic,
df = res$parameter,
pv = res$p.value)
}) %>% bind_rows()
如果要继续引用列名,请将.id
传递到bind_rows()
:
lapply(d[,-4], function(i) {
res <- t.test(i ~ d$group)
data.frame(t = res$statistic,
df = res$parameter,
pv = res$p.value)
}) %>% bind_rows(.id='id')
# id t df pv
# 1 q1 -0.7626249 17.32329 0.4559469
# 2 q2 -1.6467070 17.73117 0.1172263
# 3 q3 0.5288851 13.01589 0.6057874
示例:
set.seed(123)
d <- data.frame(
q1 = rnorm(20),
q2 = rnorm(20),
q3 = rnorm(20),
group = sample(c("A", "B"), size = 20, replace = TRUE))
l <- lapply(d[,-4], function(i) {
t.test(i ~ d$group)
})