我有一个data.frame,类似于这个:
cb <- data.frame(group = ("A", "B", "C", "D", "E"), WC = runif(100, 0, 100), Ana = runif(100, 0, 100), Clo = runif(100, 0, 100))
实际数据帧的结构:
str(cb)
data.frame: 66936 obs of 89 variables:
$group: Factor w/ 5 levels "A", "B", "C" ...
$WC: int 19 28 35 92 10 23...
$Ana: num 17.2 48 35.4 84.2
$ Clo: num 37.2 12.1 45.4 38.9
....
mean <- colMeans(cb[,2:89])
mean
WC Ana Clo ...
52.45 37.23 50.12 ...
我想对每个组和每个变量执行一个样本t.tests
为此,我做了以下事情:
A <- subset(cb, cb$group == "A")
B <- subset(cb, cb$group == "B")
...
t_A_WC <- t.test(A$WC, mu = mean[1], alternative = "two.sided")
t_B_WC <- t.test(B$WC, mu = mean[1], alternative = "two.sided")
....
t_A_Ana <- t.test(A$Ana, mu = mean[2], alternative = "two.sided")
t_B_Ana <- t.test(B$Ana, mu = mean[2], alternative = "two.sided")
....
t_A_Clo <- t.test(A$Clo, mu = mean[3], alternative = "two.sided")
t_B_Clo <- t.test(B$Clo, mu = mean[3], alternative = "two.sided")
....
结果是正确的(或似乎是),但是输入整个事物非常耗时。
有更聪明的方法吗?
我尝试过:
来自here
results <- lapply(mydf, t.test)
resultsmatrix <- do.call(cbind, results)
resultsmatrix[c("statistic","estimate","p.value"),]
但结果在某种程度上是非常错误的,并且不符合我之前计算的值。
编辑:
Here is a link to a 10.000 row sample from the actual dataset
答案 0 :(得分:1)
这种方法可能有点冗长。但我认为它捕获了你正在寻找的所有组合(“A”与“WC”,“Ana”,“Clo”,“B”与“WC”,“Ana”,“Clo”等)所以总共5组* 3个变量= 15个t检验结果。
cb <- data.frame(group = c("A", "B", "C", "D", "E"), WC = runif(100, 0, 100), Ana = runif(100, 0, 100), Clo = runif(100, 0, 100))
mean <- colMeans(cb[,2:4])
varNames <- names(cb)[-1] # removing group variable from list of variables
# t-test results are stored in a list of list
master <- list()
i <- 1
## main for loop subsets; lapply calculates t-statistics for all variables in the subset
for (group in unique(cb$group)){
# create a list of t-test result in a given "group" subset
results <- lapply((1:length(varNames)), FUN = function(x, subset = cb[cb$group == group,]) {
t.test(subset[varNames[x]], mu = mean[x], alternative = "two.sided")
})
master[[group]] <- results
i <- i + 1
}
# so for example, if you want to find the results from group "A" and "WC" you say
master[["A"]][[1]] # index one becaise "WC" is the first element of varNames
# One Sample t-test
#
# data: subset[varNames[x]]
# t = -0.417, df = 19, p-value = 0.6813
# alternative hypothesis: true mean is not equal to 46.5857
# 95 percent confidence interval:
# 30.27709 57.47510
# sample estimates:
# mean of x
# 43.87609
# from there you can just find your relevant statistic, for example
master[["A"]][[1]]$statistic # gives the t-statistic (eg. $statistic, $p.value, etc.)
# t
# -0.4170353
答案 1 :(得分:1)
首先,让我们初始化结果矩阵和组级别。
res <- matrix(NA, ncol=5,
dimnames=list(NULL, c("group", "col", "statistic", "estimate", "p.value")))
gr <- levels(cb$group)
然后我们循环遍历计算t.test的所有列,为每个可用组分配每个列。
for(cl in 2:ncol(cb)){
for(grp in gr){
temp <- cb[cb$group == grp, cl]
res <- rbind(res, c(grp, colnames(cb)[cl],
unlist(t.test(temp, mu = mean(cb[,cl]), alternative="two.sided"))[c(1, 5, 3)]))
}
}
最后,我们重新格式化结果表。
res <- data.frame(res[-1,])