多个"单样本t检验"在R

时间:2016-07-28 14:09:36

标签: r

我有一个data.frame,类似于这个:

cb <- data.frame(group = ("A", "B", "C", "D", "E"), WC = runif(100, 0, 100), Ana = runif(100, 0, 100), Clo = runif(100, 0, 100))

实际数据帧的结构:

str(cb)
data.frame: 66936 obs of 89 variables: 
$group: Factor w/ 5 levels "A", "B", "C" ...
$WC: int 19 28 35 92 10 23...
$Ana: num 17.2 48 35.4 84.2
$ Clo: num 37.2 12.1 45.4 38.9
....

mean <- colMeans(cb[,2:89])
mean
WC     Ana    Clo    ...
52.45  37.23  50.12  ...

我想对每个组和每个变量执行一个样本t.tests

为此,我做了以下事情:

A <- subset(cb, cb$group == "A")
B <- subset(cb, cb$group == "B")
...

t_A_WC <- t.test(A$WC, mu = mean[1], alternative = "two.sided")
t_B_WC <- t.test(B$WC, mu = mean[1], alternative = "two.sided")
....

t_A_Ana <- t.test(A$Ana, mu = mean[2], alternative = "two.sided")
t_B_Ana <- t.test(B$Ana, mu = mean[2], alternative = "two.sided")
....

t_A_Clo <- t.test(A$Clo, mu = mean[3], alternative = "two.sided")
t_B_Clo <- t.test(B$Clo, mu = mean[3], alternative = "two.sided")
....

结果是正确的(或似乎是),但是输入整个事物非常耗时。

有更聪明的方法吗?

我尝试过:

来自here

results <- lapply(mydf, t.test)
resultsmatrix <- do.call(cbind, results)
resultsmatrix[c("statistic","estimate","p.value"),]

但结果在某种程度上是非常错误的,并且不符合我之前计算的值。

编辑:

Here is a link to a 10.000 row sample from the actual dataset

2 个答案:

答案 0 :(得分:1)

这种方法可能有点冗长。但我认为它捕获了你正在寻找的所有组合(“A”与“WC”,“Ana”,“Clo”,“B”与“WC”,“Ana”,“Clo”等)所以总共5组* 3个变量= 15个t检验结果。

cb <- data.frame(group = c("A", "B", "C", "D", "E"), WC = runif(100, 0, 100), Ana = runif(100, 0, 100), Clo = runif(100, 0, 100))

mean <- colMeans(cb[,2:4])
varNames <- names(cb)[-1]   # removing group variable from list of variables


# t-test results are stored in a list of list
master <- list()
i <- 1

  ## main for loop subsets; lapply calculates t-statistics for all variables in the subset
  for (group in unique(cb$group)){
    # create a list of t-test result in a given "group" subset
        results <- lapply((1:length(varNames)), FUN = function(x, subset = cb[cb$group == group,]) {
      t.test(subset[varNames[x]], mu = mean[x], alternative = "two.sided")
    })


    master[[group]] <- results
    i <- i + 1
  }

# so for example, if you want to find the results from group "A" and "WC" you say
master[["A"]][[1]]   # index one becaise "WC" is the first element of varNames

#   One Sample t-test
# 
# data:  subset[varNames[x]]
# t = -0.417, df = 19, p-value = 0.6813
# alternative hypothesis: true mean is not equal to 46.5857
# 95 percent confidence interval:
#  30.27709 57.47510
# sample estimates:
# mean of x 
#  43.87609 

# from there you can just find your relevant statistic, for example

master[["A"]][[1]]$statistic   # gives the t-statistic (eg. $statistic, $p.value, etc.)

#         t 
# -0.4170353

答案 1 :(得分:1)

首先,让我们初始化结果矩阵和组级别。

res <- matrix(NA, ncol=5, 
    dimnames=list(NULL, c("group", "col", "statistic", "estimate", "p.value")))
gr <- levels(cb$group)

然后我们循环遍历计算t.test的所有列,为每个可用组分配每个列。

for(cl in 2:ncol(cb)){
    for(grp in gr){
        temp <- cb[cb$group == grp, cl]
        res <- rbind(res, c(grp, colnames(cb)[cl], 
            unlist(t.test(temp, mu = mean(cb[,cl]), alternative="two.sided"))[c(1, 5, 3)]))
    }
}

最后,我们重新格式化结果表。

res <- data.frame(res[-1,])