t.test具有多个组变量和多个变量变量

时间:2018-02-23 11:16:31

标签: r dataframe

我有数据集mydat(部分):

structure(list(city = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L), .Label = c("New-York", "Washington"), class = "factor"), 
    x1 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x2 = c(0L, 
    0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x3 = c(0L, 0L, 1L, 1L, 
    0L, 0L, 0L, 1L, 1L, 0L), x4 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 
    1L, 1L, 0L), x5 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L
    ), x6 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x7 = c(0L, 
    0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), var1 = c(10L, 71L, 49L, 
    70L, 79L, 46L, 87L, 57L, 81L, 68L), var2 = c(34L, 17L, 28L, 
    63L, 95L, 99L, 40L, 63L, 24L, 90L), var3 = c(21L, 89L, 81L, 
    26L, 59L, 87L, 84L, 24L, 27L, 83L), var4 = c(86L, 70L, 45L, 
    40L, 95L, 94L, 39L, 97L, 89L, 30L)), .Names = c("city", "x1", 
"x2", "x3", "x4", "x5", "x6", "x7", "var1", "var2", "var3", "var4"
), class = "data.frame", row.names = c(NA, -10L))

此数据集有7组二进制变量(在实际数据中有更多的组和比例变量)。 我必须用4个比例变量来比较它们。 我不想像那个

那样用一个变量来比较
t.test(var1~x1,data=mydat)
t.test(var2~x1,data=mydat)
t.test(var3~x1,data=mydat)
t.test(var4~x1,data=mydat)

t.test(var1~x2,data=mydat)
t.test(var2~x2,data=mydat)
t.test(var3~x2,data=mydat)
t.test(var4~x2,data=mydat)

如何编写循环以便在var1中交替比较所有分组变量,然后在var2中比较所有分组变量,依此类推?

2 个答案:

答案 0 :(得分:3)

另一种可能的解决方案:

xvars <- grep('x[0-9]{1}', names(mydat), value = TRUE)
testvars <- grep('var[0-9]{1}', names(mydat), value = TRUE)

lapply(xvars, function(x) lapply(testvars, function(y) t.test(mydat[,y] ~ mydat[,x], data = mydat) ))

给出(由于大小而截断的输出):

[[1]]
[[1]][[1]]

  Welch Two Sample t-test

data:  mydat[, y] by mydat[, x]
t = -0.30246, df = 7.6648, p-value = 0.7703
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -35.45353  27.28686
sample estimates:
mean in group 0 mean in group 1 
       60.16667        64.25000 


[[1]][[2]]

  Welch Two Sample t-test

data:  mydat[, y] by mydat[, x]
t = 0.98709, df = 7.9696, p-value = 0.3526
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -24.07911  60.07911
sample estimates:
mean in group 0 mean in group 1 
           62.5            44.5

答案 1 :(得分:1)

首先,您可以使用expand.grid生成所需变量的组合。

combinations = expand.grid(colnames(mydat)[9:12],colnames(mydat)[2:8],stringsAsFactors = FALSE)

然后将mapply用于相应的变量对并应用t.test()

mapply(function(x,y){t.test(formula=as.formula(paste0(x,"~",y)),data=mydat)},combinations$Var1,combinations$Var2,SIMPLIFY = FALSE,USE.NAMES = FALSE)

输出将是28个比较的列表:

[[1]]

Welch Two Sample t-test

data:  var1 by x1
t = -0.30246, df = 7.6648, p-value = 0.7703
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-35.45353  27.28686
sample estimates:
mean in group 0 mean in group 1 
   60.16667        64.25000 


[[2]]

Welch Two Sample t-test

data:  var2 by x1
t = 0.98709, df = 7.9696, p-value = 0.3526
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-24.07911  60.07911
sample estimates:
mean in group 0 mean in group 1 
       62.5            44.5 

编辑(根据评论中的要求):

library(plyr)

combinations = expand.grid(colnames(mydat)[9:12],colnames(mydat)[2:8],stringsAsFactors = FALSE)

myfun<-function(dat){mapply(function(x,y){t.test(formula=as.formula(paste0(x,"~",y)),data=dat)},combinations$Var1,combinations$Var2,SIMPLIFY = FALSE,USE.NAMES = FALSE)

dlply(.data = mydat,.variables = "city",.fun = "myfun")