如何将组数据t.test应用于r中的多列?

时间:2016-11-01 03:17:28

标签: r

我打算在df中按小组q1A&B进行t.test测试

q1 q2 q3 group
1  0  1  A
0  1  0  B
1  1  1  A
0  1  0  B

然后脚本是:

t.test(subset(df,group==A,select = c("q1")),subset(df,group==B,select = c("q1")),alternative = "two.sided")

我为t.test脚本创建了一个函数:

x<-function(qnum){t.test(subset(df,group==A,select = c("qnum")),subset(df,group==B,select = c("qnum")),alternative = "two.sided")}

然后我认为apply可以给我q1,q2,q3...

的t.test结果
y<-select(df,grep("q\\d",colnames(df),perl=TRUE))
apply(y,2,x)

但有错误:

Error in `[.data.frame`(x, r, vars, drop = drop) :

如何自动获取多列的t.test结果?

1 个答案:

答案 0 :(得分:4)

您可以使用t.test()中的公式更好地处理此问题。例如,t.test(q1 ~ group, data = df)

下面我将使用公式模拟数据进行演示,然后使用lapply()为每列运行t.test()group除外):

# Create data
set.seed(123)  # This makes sampling replicable
d <- data.frame(
  q1 = rnorm(20),
  q2 = rnorm(20),
  q3 = rnorm(20),
  group = sample(c("A", "B"), size = 20, replace = TRUE)
)

head(d)
#>            q1         q2         q3 group
#> 1 -0.56047565 -1.0678237 -0.6947070     B
#> 2 -0.23017749 -0.2179749 -0.2079173     A
#> 3  1.55870831 -1.0260044 -1.2653964     A
#> 4  0.07050839 -0.7288912  2.1689560     A
#> 5  0.12928774 -0.6250393  1.2079620     A
#> 6  1.71506499 -1.6866933 -1.1231086     B

# Example of using a formula
t.test(d$q1 ~ d$group)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  d$q1 by d$group
#> t = -0.76262, df = 17.323, p-value = 0.4559
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.2294678  0.5759458
#> sample estimates:
#> mean in group A mean in group B 
#>     -0.05443279      0.27232820


# How to apply t.test to every column with lapply()
# - d[,-4] is all data excluding `group` variable
lapply(d[,-4], function(i) t.test(i ~ d$group))
#> $q1
#> 
#>  Welch Two Sample t-test
#> 
#> data:  i by d$group
#> t = -0.76262, df = 17.323, p-value = 0.4559
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.2294678  0.5759458
#> sample estimates:
#> mean in group A mean in group B 
#>     -0.05443279      0.27232820 
#> 
#> 
#> $q2
#> 
#>  Welch Two Sample t-test
#> 
#> data:  i by d$group
#> t = -1.6467, df = 17.731, p-value = 0.1172
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.2881952  0.1568201
#> sample estimates:
#> mean in group A mean in group B 
#>      -0.3906697       0.1750179 
#> 
#> 
#> $q3
#> 
#>  Welch Two Sample t-test
#> 
#> data:  i by d$group
#> t = 0.52889, df = 13.016, p-value = 0.6058
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -0.7569843  1.2478547
#> sample estimates:
#> mean in group A mean in group B 
#>     0.253746354     0.008311147