我是R.的新手。我想在我的数据框架上使用Kruskal.test,它有50行和76列。数据框的一部分看起来像这样。
status -1 Actinomyces Parascardovia Corynebacterium Rothia Bifidobacterium
KnownDiabeetic 0.313151767 0.000101245 0 0 0 0.055077453
KnownDiabeetic 0.549817041 0 0 0 0.000104548 0.018609514
KnownDiabeetic 0.176596177 0 0 0 0 0.036498577
KnownDiabeetic 0.100851409 0.000405433 0 0 0.000101358 0.04054328
KnownDiabeetic 0.073431511 0.000100867 0 0 0 0.070808957
KnownDiabeetic 0.335514698 0 0 0.000103875 0 0.089539836
KnownDiabeetic 0.307456901 0 0 0 0 0.007242681
KnownDiabeetic 0.090503247 0.000202922 0 0 0 0.002029221
KnownDiabeetic 0.401858774 0 0 0 0 0.00323265
KnownDiabeetic 0.256320658 0.000513875 0 0 0.002980473 0.028057554
KnownDiabeetic 0.02540743 0.00020245 0 0 0.000404899 0.120558761
KnownDiabeetic 0.191452468 0.001631987 0 0 0.000101999 0.374745002
KnownDiabeetic 0.230440533 0.002645233 0 0 0.001017397 0.274086886
KnownDiabeetic 0.328139322 0.001425807 0.000203687 0 0.000407373 0.319890009
KnownDiabeetic 0.026437135 0.000307409 0 0 0.00215186 0.22625269
KnownDiabeetic 0.273827688 0 0 0 0 0.009154715
NewlyDiagnosed 0.57150086 0 0 0 0.000101204 0.001012043
NewlyDiagnosed 0.565323565 0 0 0 0.00010175 0.089336589
NewlyDiagnosed 0.355542096 0 0 0 0 0.001312336
NewlyDiagnosed 0.446341716 0.000206975 0 0 0 0.050191452
我正在尝试迭代地使用kruskal.test来确定细菌属(第2:76列)与分组变量(状态)之间是否存在统计学上的显着差异。我正在使用以下R脚本
mydf<-Kruskal_genus_open_test
kruskal.wallis.table <- data.frame()
for(i in seq(along=mydf[,1])) {
## Run the KW test on on gene
x <- as.vector(as.matrix(Kruskal_genus_open_test[i,]))
ks.test <- kruskal.test(x, g=PCS_map$Description)
## Store the result in the data frame
kruskal.wallis.table <- rbind(kruskal.wallis.table,
data.frame(id=training.filtered.probe.names[i],
p.value=ks.test$p.value
))
## Report number of genes tested
verbose(paste("Kruskal-Wallis test for gene ", i, "/",
training.filtered.probe.nb, "; p-value=", ks.test$p.value, sep=""))
}
但是我的错误是
kruskal.test.default(x,g = PCS_map $ Description)出错: &#39; X&#39;并且&#39; g&#39;必须具有相同的长度
请帮助解决这个问题。
谢谢,
答案 0 :(得分:2)
如果您只想获得每个测试的p值,以下应该可以正常工作:
apply(mydf[,-1], 2, function(x) kruskal.test(x,mydf[,1])$p.value)