我试图遍历具有相同值的列和子集数据。
请参阅下文。
White <- rep(0:1, 50)
Latino <- rep(0:1, 50)
Black <- rep(0:1, 50)
Asian <- rep(0:1, 50)
DV <- seq(1: length(rep(0:1, 50)))
x <- data.frame(cbind(White, Latino, Black, Asian, DV))
race <- c("White", "Latino", "Black", "Asian")
for(j in race){
for (i in race){
df_1 <- subset(x, i == 1)
df_2 <- subset(x, j == 1)
print(paste(i, j, sep = " "))
print(t.test(df_1$DV, df_2$DV) )
}
}
不幸的是,r不喜欢i或j独立存在。如果有人知道循环遍历列以子集相同值的更好方法,将不胜感激。谢谢
答案 0 :(得分:2)
请注意,代码中的i
和j
是一个字符串,但是实际上您想提取该列,例如
for(j in race){
for (i in race){
df_1 <- subset(x, x[,i] == 1)
df_2 <- subset(x, x[,j] == 1)
print(paste(i, j, sep = " "))
print(t.test(df_1$DV, df_2$DV) )
}
}
关于更好的循环方式,似乎伪变量White
,Latino
,Black
和Asian
是互斥的,因此,也许我们可以重新排列数据输入
race DV
------------
1 Black 1
2 White 2
3 Latino 3
4 Black 4
5 Asian 5
并使用公式调用t.test
,例如
# generate synthetic data
rnd.race <- sample(1:4, 50, replace=T)
x <- data.frame(
White = as.integer(rnd.race == 1),
Latino = as.integer(rnd.race == 2),
Black = as.integer(rnd.race == 3),
Asian = as.integer(rnd.race == 4),
DV = seq(1: length(rep(0:1, 50)))
)
race <- c("White", "Latino", "Black", "Asian")
# rearrange data, gather columns of dummy variables
x.cleaned = data.frame(
race = race[apply(x[,1:4], 1, which.max)],
DV = x$DV
)
t.test( DV ~ race, data=x.cleaned, race %in% c("White", "Black"))
#
# Welch Two Sample t-test
#
# data: DV by race
# t = -0.91517, df = 42.923, p-value = 0.3652
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -25.241536 9.483961
# sample estimates:
# mean in group Black mean in group White
# 47.66667 55.54545
#
将t.test
与公式结合使用的好处是可读性强。例如,在t.test
的报告中,它将代替mean in group x
和mean in group y
,而不是mean in group Black
和mean in group White
,并且公式本身将变量声明为我们正在测试协变。
要在所有对之间反复进行t检验,我们可以
run.test = function(race.pair) {
list(t.test(DV ~ race, data=x.cleaned, race %in% race.pair) )
}
combn(race, 2, FUN = run.test)
# [[1]]
#
# Welch Two Sample t-test
#
# data: DV by race
# t = -0.30892, df = 41.997, p-value = 0.7589
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -21.22870 15.59233
# sample estimates:
# mean in group Latino mean in group White
# 52.72727 55.54545
#
#
# [[2]]
#
# Welch Two Sample t-test
#
# data: DV by race
# t = -0.91517, df = 42.923, p-value = 0.3652
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -25.241536 9.483961
# sample estimates:
# mean in group Black mean in group White
# 47.66667 55.54545
#
# ...
其中combn(x, m, FUN = NULL, simplify = TRUE, ...)
是内置函数,可一次生成x
所产生的m
元素的所有组合。有关使用outer
的更多情况,请参见@askrun's answer。
最后,在比较三个或更多组之间的均值时,恕我直言,也许方差分析比t检验更广为人知(也可能说明为什么在成对的组上反复使用t检验“不方便”)。 >
借助x.cleaned
,我们可以轻松地在R中使用ANOVA,例如:
aov.out = aov(DV ~ race, data=x.cleaned)
summary(aov.out)
请注意,在进行单向ANOVA(测试某些组均值是否不同)之后,我们还可以运行事后检验(例如TukeyHSD(aov.out)
),以找出具有不同均值的特定组对。正式报告中还对假设进行了一些检验。 Here是与此相关的讲义。 this是与“交叉验证”相关的问题(可以回答选择哪个测试的其他问题)。
答案 1 :(得分:2)
在R
中,我们也可以使用outer
f1 <- function(u, v) list(t.test(x$DV[x[[u]] ==1], x$DV[x[[v]] == 1]))
out <- outer(race, race, FUN = Vectorize(f1))
out[1,1]
#[[1]]
# Welch Two Sample t-test
#data: x$DV[x[[u]] == 1] and x$DV[x[[v]] == 1]
#t = 0, df = 98, p-value = 1
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -11.57133 11.57133
#sample estimates:
#mean of x mean of y
# 51 51
它可以变成list
输出
lst1 <- setNames(lapply(out, I), outer(race, race, FUN = paste)
答案 2 :(得分:1)
您可能需要添加get
for(j in race){
for (i in race){
df_1 <- subset(x, get(i) == 1)
df_2 <- subset(x, get(j) == 1)
print(paste(i, j, sep = " "))
print(t.test(df_1$DV, df_2$DV) )
}
}