Question

我使用以下函数计算数据框（x）中数据的t-stat：

    wilcox.test.all.genes<-function(x,s1,s2) {
     x1<-x[s1]
     x2<-x[s2]
     x1<-as.numeric(x1)
     x2<-as.numeric(x2)
     wilcox.out<-wilcox.test(x1,x2,exact=F,alternative="two.sided",correct=T)
     out<-as.numeric(wilcox.out$statistic)
     return(out)
    }

我需要编写一个迭代特定次数的for循环。对于每次迭代，需要对列进行混洗，执行上述函数并将最大t-stat值保存到列表中。

我知道我可以使用sample()函数来重排数据框的列，使用max()函数来识别最大t-stat值，但我无法弄清楚如何将它们组合在一起以实现可行的代码。

Answer 1

您正在尝试生成经验值p值，因为您的数据中存在多个列，因此会对您进行的多次比较进行更正。首先，让我们模拟一个示例数据集：

# Simulate data
n.row = 100
n.col = 10

set.seed(12345)
group = factor(sample(2, n.row, replace=T))
data  = data.frame(matrix(rnorm(n.row*n.col), nrow=n.row))

计算每列的Wilcoxon检验，但我们将复制这一次，同时置换观察的类成员资格。这给了我们这个测试统计的经验空分布。

# Re-calculate columnwise test statisitics many times while permuting class labels
perms = replicate(500, apply(data[sample(nrow(data)), ], 2, function(x) wilcox.test(x[group==1], x[group==2], exact=F, alternative="two.sided", correct=T)$stat))

通过折叠多个比较来计算最大测试统计量的空分布。

# For each permuted replication, calculate the max test statistic across the multiple comparisons
perms.max = apply(perms, 2, max)

通过简单地对结果进行排序，我们现在可以确定p = 0.05临界值。

# Identify critical value 
crit = sort(perms.max)[round((1-0.05)*length(perms.max))]

我们还可以绘制我们的分布以及临界值。

# Plot 
dev.new(width=4, height=4)
hist(perms.max)
abline(v=crit, col='red')

enter image description here

最后，将实际检验统计量与此分布进行比较将为您提供经验p值，通过将家族方式误差控制为p <0.05来校正多重比较。例如，让我们假装一个真实的测试数据是1600.然后我们可以计算p值，如：

> length(which(perms.max>1600))/length(perms.max)
[1] 0.074

For循环使用t-stat函数创建列表

1 个答案: