重复应用后回顾统计

时间:2019-05-22 11:29:54

标签: r lapply

我有一个包含多个列和两个不同组的数据框-参见下文。

set.seed(123) 
d <- data.frame(
  q1 = rnorm(20),
  q2 = rnorm(20),
  q3 = rnorm(20),
  group = sample(c("A", "B"), size = 20, replace = TRUE))

我使用lapply来计算两组之间每一列的ttest,如下所示:

lapply(d[,-4], function(i) t.test(i ~ d$group))

lapply为每一列返回列出几个统计信息数据的结果(我刚刚报告了q1列)

$q1

    Welch Two Sample t-test

data:  i by d$group
t = -0.76262, df = 17.323, p-value = 0.4559
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.2294678  0.5759458
sample estimates:
mean in group A mean in group B 
    -0.05443279      0.27232820 

我想将每列(q1,q2,q3 ...)的主要统计信息(t,df,pvalue)概括为单个表

1 个答案:

答案 0 :(得分:1)

您可以再次使用lapply()来提取每个参数和bind_rows()

library(dplyr)
lapply(l, function(x) {
  data.frame(t = x$statistic,
             df = x$parameter,
             pv = x$p.value) # returns a dataframe for each element in l
}) %>% bind_rows()

#           t        df         pv
# 1 -1.031983 13.533116 0.32017136
# 2 -2.458574  9.771018 0.03427922
# 3  1.421821 11.416813 0.18181697

您可以一次完成此操作:

lapply(d[,-4], function(i) {
  res <- t.test(i ~ d$group)
  data.frame(t = res$statistic,
             df = res$parameter,
             pv = res$p.value)
  }) %>% bind_rows()

如果要继续引用列名,请将.id传递到bind_rows()

lapply(d[,-4], function(i) {
  res <- t.test(i ~ d$group)
  data.frame(t = res$statistic,
             df = res$parameter,
             pv = res$p.value)
}) %>% bind_rows(.id='id')
#   id          t       df        pv
# 1 q1 -0.7626249 17.32329 0.4559469
# 2 q2 -1.6467070 17.73117 0.1172263
# 3 q3  0.5288851 13.01589 0.6057874

示例:

set.seed(123) 
d <- data.frame(
  q1 = rnorm(20),
  q2 = rnorm(20),
  q3 = rnorm(20),
  group = sample(c("A", "B"), size = 20, replace = TRUE))
l <- lapply(d[,-4], function(i) {
  t.test(i ~ d$group)

  })