对ggplot2中的特定数据执行统计测试

时间:2011-03-29 22:28:47

标签: r ggplot2

我编写了一个使用ggplot2生成绘图的脚本,并且在每个绘图中都有多个x轴值,并且每一个都在y轴上有多个值用于此轴上的多个变量。

我将以另一种方式提出问题:我在数据帧中有多个数据子集,在for循环内生成,我如何控制for的循环以生成包含在每行中的另一个数据帧(先前数据帧的第一列的值)

for (x in phy) {
    print(x)

    test<-subset(t, Phylum==x)
    dat <- melt(test, measure=c("A","C","G","T","(A-T)/(A+T)","(G-C)/(G+T)",
                                "(A+T)/(G+C)"))
    unitest <- unique(c(test$Class))
    #print(nrow(test))
    i <- 1
    for(y in unitest) {
        towork <- subset(test, Class==y)

        # here i want to create a data frame that will contain (in each row, the
        # value of the first column of the towork subset for each y)

        # atest=wilcox.test(towork$A,towork$A, correct=FALSE)
        # print(paste(paste(y,towork$A),towork$A))
    }
}



input:

    e.g 
    class1:
    0.268912    0.158921    0.214082    0.358085
    1.680946         0.314681   0.210526    0.166895
    0.286945    0.322006    0.147361    0.243688
    class2
    0.293873    0.327516    0.156235    0.222376    
    0.327430    0.308667    0.135710    0.227695    
    0.301488    0.326511    0.125865    0.246022    
    0.310980    0.308730    0.148861    0.231429

我希望新数据框在每一行中包含每个类的第一列。

output
    e.g
    1st row: 0.268912 1.680946 0.286945
    2nd row:0.293873 0.327430 0.301488 0.310980

...等 然后是另一个数据框,每行包含每个类的第二列 等...

比想要对新数据框的每两行进行统计测试(例如Wilcoxon秩和检验)并得到结果。

任何帮助将不胜感激

Hello , i came up with an idea , but i need your help to do it.
first the data is in a large text file and i will upload it if you want , my idea is : create a function that take 2 argument : 
1.the name of the column which should be used for grouping the data (e.g. phylum, or class)
2. the name of the column containing the data to test (e.g. A,C,G,T)
and i will test the data for each phylum first , and if i want i will test it for each class in each phylum.
that's mean,i will take the A column for first phylum and A column for 2nd phylum and make the wilcox.test on them ,  and i will make the process for each common column in each phylum. and then i will use a subset function to test the classes inside each phylum.  
give me your opininon with this ??
提前thnx。

1 个答案:

答案 0 :(得分:0)

我认为这会照顾你的目标。我们不一定需要完成为感兴趣的四个变量创建新data.frames的过程 - 我们可以从class1class2中的各自位置提取感兴趣的列。代码已更新,以查找class1和class2之间的公共列。它只计算那些常见列的wilcox测试。

class1 <- matrix(rnorm(12), ncol = 4)
class2 <- matrix(rnorm(16), ncol = 4)

computeWilcox <- function(x, y, correct = FALSE, ...) {

    if (!is.numeric(x)) stop("x must be numeric.")
    if (!is.numeric(y)) stop("y must be numeric.")

    commonCols <- intersect(colnames(x), colnames(y))

    ret <- vector("list", length(commonCols))

    for (col in 1:length(commonCols)) {
        ret[[col]] <- wilcox.test(x[, col], y[, col], correct = correct, ...)
    }

    names(ret) <- commonCols
    return(ret)
}


zz <- computeWilcox(class1, class2)

zz的结构如下:

> str(zz)
List of 2
 $ c:List of 7
  ..$ statistic  : Named num 0
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.0571
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum test"
  ..$ data.name  : chr "x[, col] and y[, col]"
  ..- attr(*, "class")= chr "htest"
 $ d:List of 7
  ..$ statistic  : Named num 2
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.229
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum test"
  ..$ data.name  : chr "x[, col] and y[, col]"
  ..- attr(*, "class")= chr "htest"

您可以从返回的列表对象中提取参数或p值,如下所示:

> zz$c$p.value
[1] 0.05714286