在R中的多个箱图上进行t检验

时间:2014-07-02 21:46:14

标签: r ggplot2

我有一个带有40个盒子图的刻面图。我想计算图中每个箱图的箱图不同于零的p值(已调整)。所以基本上是一个简单的测试。 有没有办法对整个数据集执行此操作?

数据示例:

   Pos    Pair Fold_Change
    1     Yes     0.36170477
    1     Yes     0.63926759
    1     No     -0.26791834
    1     No      0.06245854
    2     Yes     0.95403940
    2     Yes     0.45182453
    2     No      0.95403940
    2     No      0.45182453
    ....

ggplot(Pairing,aes(Pairing,Fold_Change,fill=Pairing)) + geom_boxplot() + facet_grid(~Position)

1 个答案:

答案 0 :(得分:0)

您可以根据splitPos Pair数据集,然后“批处理” - 使用t.testFold_Change上执行lapply

使用如下示例数据集:

> Pairings <- expand.grid(Pos=1:4, Pair=rep(c('Yes','No'),2))
> Pairings$Fold_Change <- runif(16)
> Pairings
   Pos Pair Fold_Change
1    1  Yes  0.31076451
2    2  Yes  0.73035471
3    3  Yes  0.40153490
4    4  Yes  0.34368815
5    1   No  0.75398667
6    2   No  0.34578325
7    3   No  0.93771945
8    4   No  0.60309476
9    1  Yes  0.55861350
10   2  Yes  0.75467734
11   3  Yes  0.55299688
12   4  Yes  0.81453028
13   1   No  0.39110782
14   2   No  0.04561982
15   3   No  0.71373404
16   4   No  0.79267332

这是执行你正在寻找的单行代码:

> lapply(split(Pairings,Pairings[,1:2]),function(x)t.test(x$Fold_Change)$p.value)
$`1.Yes`
[1] 0.1768022

$`2.Yes`
[1] 0.01042596

$`3.Yes`
[1] 0.1001815

$`4.Yes`
[1] 0.2458096

$`1.No`
[1] 0.1953704

$`2.No`
[1] 0.4164919

$`3.No`
[1] 0.08582059

$`4.No`
[1] 0.08594222

解释让我们一步一步来做。 第1步:根据Pos和Pair分割数据集:

> split(Pairings,Pairings[,1:2])
$`1.Yes`
  Pos Pair Fold_Change
1   1  Yes   0.3107645
9   1  Yes   0.5586135

$`2.Yes`
   Pos Pair Fold_Change
2    2  Yes   0.7303547
10   2  Yes   0.7546773

$`3.Yes`
   Pos Pair Fold_Change
3    3  Yes   0.4015349
11   3  Yes   0.5529969

$`4.Yes`
   Pos Pair Fold_Change
4    4  Yes   0.3436882
12   4  Yes   0.8145303

$`1.No`
   Pos Pair Fold_Change
5    1   No   0.7539867
13   1   No   0.3911078

$`2.No`
   Pos Pair Fold_Change
6    2   No  0.34578325
14   2   No  0.04561982

$`3.No`
   Pos Pair Fold_Change
7    3   No   0.9377195
15   3   No   0.7137340

$`4.No`
   Pos Pair Fold_Change
8    4   No   0.6030948
16   4   No   0.7926733

步骤2:在列表上执行t.test:

> lapply(split(Pairings,Pairings[,1:2]),function(x)t.test(x$Fold_Change))
$`1.Yes`

    One Sample t-test

data:  x$Fold_Change 
t = 3.5077, df = 1, p-value = 0.1768
alternative hypothesis: true mean is not equal to 0 
95 percent confidence interval:
 -1.139921  2.009299 
sample estimates:
mean of x 
 0.434689 


$`2.Yes`

    One Sample t-test

data:  x$Fold_Change 
t = 61.0556, df = 1, p-value = 0.01043
alternative hypothesis: true mean is not equal to 0 
95 percent confidence interval:
 0.5879919 0.8970401 
sample estimates:
mean of x 
 0.742516 

...

第3步:通过function(x)t.test(x$Fold_Change)$p.value保留p.value。

如果您想让它更清晰一点,您可以按如下方式修改它:

> do.call(rbind,
          lapply(split(Pairings,Pairings[,1:2]),   
                 function(x)data.frame(Pos=x$Pos[1],
                                       Pair=x$Pair[1],
                                       p.value=t.test(x$Fold_Change)$p.value)))
      Pos Pair    p.value
1.Yes   1  Yes 0.17680222
2.Yes   2  Yes 0.01042596
3.Yes   3  Yes 0.10018152
4.Yes   4  Yes 0.24580956
1.No    1   No 0.19537040
2.No    2   No 0.41649186
3.No    3   No 0.08582059
4.No    4   No 0.08594222