我有数据集mydat
(部分):
structure(list(city = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L), .Label = c("New-York", "Washington"), class = "factor"),
x1 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x2 = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x3 = c(0L, 0L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 0L), x4 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L,
1L, 1L, 0L), x5 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L
), x6 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x7 = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), var1 = c(10L, 71L, 49L,
70L, 79L, 46L, 87L, 57L, 81L, 68L), var2 = c(34L, 17L, 28L,
63L, 95L, 99L, 40L, 63L, 24L, 90L), var3 = c(21L, 89L, 81L,
26L, 59L, 87L, 84L, 24L, 27L, 83L), var4 = c(86L, 70L, 45L,
40L, 95L, 94L, 39L, 97L, 89L, 30L)), .Names = c("city", "x1",
"x2", "x3", "x4", "x5", "x6", "x7", "var1", "var2", "var3", "var4"
), class = "data.frame", row.names = c(NA, -10L))
此数据集有7组二进制变量(在实际数据中有更多的组和比例变量)。 我必须用4个比例变量来比较它们。 我不想像那个
那样用一个变量来比较t.test(var1~x1,data=mydat)
t.test(var2~x1,data=mydat)
t.test(var3~x1,data=mydat)
t.test(var4~x1,data=mydat)
t.test(var1~x2,data=mydat)
t.test(var2~x2,data=mydat)
t.test(var3~x2,data=mydat)
t.test(var4~x2,data=mydat)
如何编写循环以便在var1中交替比较所有分组变量,然后在var2中比较所有分组变量,依此类推?
答案 0 :(得分:3)
另一种可能的解决方案:
xvars <- grep('x[0-9]{1}', names(mydat), value = TRUE)
testvars <- grep('var[0-9]{1}', names(mydat), value = TRUE)
lapply(xvars, function(x) lapply(testvars, function(y) t.test(mydat[,y] ~ mydat[,x], data = mydat) ))
给出(由于大小而截断的输出):
[[1]] [[1]][[1]] Welch Two Sample t-test data: mydat[, y] by mydat[, x] t = -0.30246, df = 7.6648, p-value = 0.7703 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -35.45353 27.28686 sample estimates: mean in group 0 mean in group 1 60.16667 64.25000 [[1]][[2]] Welch Two Sample t-test data: mydat[, y] by mydat[, x] t = 0.98709, df = 7.9696, p-value = 0.3526 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -24.07911 60.07911 sample estimates: mean in group 0 mean in group 1 62.5 44.5
答案 1 :(得分:1)
首先,您可以使用expand.grid
生成所需变量的组合。
combinations = expand.grid(colnames(mydat)[9:12],colnames(mydat)[2:8],stringsAsFactors = FALSE)
然后将mapply
用于相应的变量对并应用t.test()
。
mapply(function(x,y){t.test(formula=as.formula(paste0(x,"~",y)),data=mydat)},combinations$Var1,combinations$Var2,SIMPLIFY = FALSE,USE.NAMES = FALSE)
输出将是28个比较的列表:
[[1]]
Welch Two Sample t-test
data: var1 by x1
t = -0.30246, df = 7.6648, p-value = 0.7703
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-35.45353 27.28686
sample estimates:
mean in group 0 mean in group 1
60.16667 64.25000
[[2]]
Welch Two Sample t-test
data: var2 by x1
t = 0.98709, df = 7.9696, p-value = 0.3526
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-24.07911 60.07911
sample estimates:
mean in group 0 mean in group 1
62.5 44.5
编辑(根据评论中的要求):
library(plyr)
combinations = expand.grid(colnames(mydat)[9:12],colnames(mydat)[2:8],stringsAsFactors = FALSE)
myfun<-function(dat){mapply(function(x,y){t.test(formula=as.formula(paste0(x,"~",y)),data=dat)},combinations$Var1,combinations$Var2,SIMPLIFY = FALSE,USE.NAMES = FALSE)
dlply(.data = mydat,.variables = "city",.fun = "myfun")