R:对所有列上的所有子集进行t检验

时间:2012-03-12 14:56:17

标签: r

这是来自R: t-test over all columns

的后续问题

假设我有一个庞大的数据集,然后我根据某些条件创建了许多子集。子集应具有相同的列数。然后我想一次对两个子集进行t检验(外循环),然后对每个子集组合一次一列地遍历所有列(内循环)。

以下是我根据之前的答案提出的建议。这个因错误而停止。

C <- c("c1","c1","c1","c1","c1",
   "c2","c2","c2","c2","c2",
   "c3","c3","c3","c3","c3",
   "c4","c4","c4","c4","c4",
   "c5","c5","c5","c5","c5",
   "c6","c6","c6","c6","c6",
   "c7","c7","c7","c7","c7",
   "c8","c8","c8","c8","c8",
   "c9","c9","c9","c9","c9",
   "c10","c10","c10","c10","c10")
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
Data <- data.frame(C, X, Y, Z)

Data.c1 = subset(Data, C == "c1",select=X:Z)
Data.c2 = subset(Data, C == "c2",select=X:Z)
Data.c3 = subset(Data, C == "c3",select=X:Z)
Data.c4 = subset(Data, C == "c4",select=X:Z)
Data.c5 = subset(Data, C == "c5",select=X:Z)

Data.Subsets = c("Data.c1",
                 "Data.c2",
                 "Data.c3",
                 "Data.c4",
                 "Data.c5") 

library(plyr)

combo1 <- combn(length(Data.Subsets),1)
adply(combo1, 1, function(x) {

  combo2 <- combn(ncol(Data.Subsets[x]),2)
  adply(combo2, 2, function(y) {

      test <- t.test( Data.Subsets[x][, y[1]], Data.Subsets[x][, y[2]], na.rm=TRUE)

      out <- data.frame("Subset" = rownames(Data.Subsets[x]),
                    , "Row" = colnames(x)[y[1]]
                    , "Column" = colnames(x[y[2]])
                    , "t.value" = round(test$statistic,3)
                    ,  "df"= test$parameter
                    ,  "p.value" = round(test$p.value, 3)
                    )
      return(out)
  } )
} )

2 个答案:

答案 0 :(得分:5)

首先,您可以使用gl更轻松地定义数据集,并避免为列创建单个变量。

Data <- data.frame(
  C = gl(10, 5, labels = paste("c", 1:10, sep = "")),
  X = rnorm(n = 50, mean = 10, sd = 5),
  Y = rnorm(n = 50, mean = 15, sd = 6),
  Z = rnorm(n = 50, mean = 20, sd = 5)
)

使用melt包中的reshape将其转换为“long”格式。 (您也可以使用基础reshape功能。)

longData <- melt(Data, id.vars = "C")

现在使用pairwise.t.test计算每个C级的所有X / Y / Z对的t检验。

with(longData, pairwise.t.test(value, interaction(C, variable)))

请注意,使用pairwise.t.test而不是单独调用t.test非常重要,因为如果您进行大量测试,则需要调整p值。 (例如,请参阅xkcd进行解释。)

一般来说,成对t检验不如回归,所以要小心它们的用法。

答案 1 :(得分:1)

您可以使用get(Data.subset[x])来挑选相关的数据框。但我不认为这是必要的。

多次明确地进行子集化也不是必需的。您可以使用

之类的东西创建它们
conditions = c("c1", "c2", "c3", "c4", "c5")
dfs <- lapply(conditions, function(x){subset(Data, C==x, select=X:Z)})

那应该(没有测试)返回一个数据框列表,每个数据框在你传递它的各种条件下。

然而,正如@Richie Cotton指出的那样,重塑您的数据框并使用成对t测试会更好。

我应该指出,做这么多t检验似乎并不明智。即使经过多次测试的校正,无论是FDR,排列还是其他方式。最好试着弄清楚你是否可以使用某种类型的anova,因为它们几乎就是用于此目的。