如何在数据框架上迭代使用Kruskal.test

时间:2014-04-12 05:38:07

标签: r

我是R.的新手。我想在我的数据框架上使用Kruskal.test,它有50行和76列。数据框的一部分看起来像这样。

status  -1  Actinomyces Parascardovia   Corynebacterium Rothia  Bifidobacterium
KnownDiabeetic  0.313151767 0.000101245 0   0   0   0.055077453
KnownDiabeetic  0.549817041 0   0   0   0.000104548 0.018609514
KnownDiabeetic  0.176596177 0   0   0   0   0.036498577
KnownDiabeetic  0.100851409 0.000405433 0   0   0.000101358 0.04054328
KnownDiabeetic  0.073431511 0.000100867 0   0   0   0.070808957
KnownDiabeetic  0.335514698 0   0   0.000103875 0   0.089539836
KnownDiabeetic  0.307456901 0   0   0   0   0.007242681
KnownDiabeetic  0.090503247 0.000202922 0   0   0   0.002029221
KnownDiabeetic  0.401858774 0   0   0   0   0.00323265
KnownDiabeetic  0.256320658 0.000513875 0   0   0.002980473 0.028057554
KnownDiabeetic  0.02540743  0.00020245  0   0   0.000404899 0.120558761
KnownDiabeetic  0.191452468 0.001631987 0   0   0.000101999 0.374745002
KnownDiabeetic  0.230440533 0.002645233 0   0   0.001017397 0.274086886
KnownDiabeetic  0.328139322 0.001425807 0.000203687 0   0.000407373 0.319890009
KnownDiabeetic  0.026437135 0.000307409 0   0   0.00215186  0.22625269
KnownDiabeetic  0.273827688 0   0   0   0   0.009154715
NewlyDiagnosed  0.57150086  0   0   0   0.000101204 0.001012043
NewlyDiagnosed  0.565323565 0   0   0   0.00010175  0.089336589
NewlyDiagnosed  0.355542096 0   0   0   0   0.001312336
NewlyDiagnosed  0.446341716 0.000206975 0   0   0   0.050191452

我正在尝试迭代地使用kruskal.test来确定细菌属(第2:76列)与分组变量(状态)之间是否存在统计学上的显着差异。我正在使用以下R脚本

mydf<-Kruskal_genus_open_test 
kruskal.wallis.table <- data.frame()
for(i in seq(along=mydf[,1]))  {
    ## Run the KW test on on gene
    x <- as.vector(as.matrix(Kruskal_genus_open_test[i,]))
    ks.test <- kruskal.test(x, g=PCS_map$Description)
    ## Store the result in the data frame
    kruskal.wallis.table <- rbind(kruskal.wallis.table,
                                  data.frame(id=training.filtered.probe.names[i],
                                             p.value=ks.test$p.value
                                  ))
    ## Report number of genes tested
    verbose(paste("Kruskal-Wallis test for gene ", i, "/", 
                  training.filtered.probe.nb, "; p-value=", ks.test$p.value, sep=""))
}

但是我的错误是

kruskal.test.default(x,g = PCS_map $ Description)出错:   &#39; X&#39;并且&#39; g&#39;必须具有相同的长度

请帮助解决这个问题。

谢谢,

1 个答案:

答案 0 :(得分:2)

如果您只想获得每个测试的p值,以下应该可以正常工作:

apply(mydf[,-1], 2, function(x) kruskal.test(x,mydf[,1])$p.value)