我有一个带有40个盒子图的刻面图。我想计算图中每个箱图的箱图不同于零的p值(已调整)。所以基本上是一个简单的测试。 有没有办法对整个数据集执行此操作?
数据示例:
Pos Pair Fold_Change
1 Yes 0.36170477
1 Yes 0.63926759
1 No -0.26791834
1 No 0.06245854
2 Yes 0.95403940
2 Yes 0.45182453
2 No 0.95403940
2 No 0.45182453
....
ggplot(Pairing,aes(Pairing,Fold_Change,fill=Pairing)) + geom_boxplot() + facet_grid(~Position)
答案 0 :(得分:0)
您可以根据split
和Pos
Pair
数据集,然后“批处理” - 使用t.test
在Fold_Change
上执行lapply
使用如下示例数据集:
> Pairings <- expand.grid(Pos=1:4, Pair=rep(c('Yes','No'),2))
> Pairings$Fold_Change <- runif(16)
> Pairings
Pos Pair Fold_Change
1 1 Yes 0.31076451
2 2 Yes 0.73035471
3 3 Yes 0.40153490
4 4 Yes 0.34368815
5 1 No 0.75398667
6 2 No 0.34578325
7 3 No 0.93771945
8 4 No 0.60309476
9 1 Yes 0.55861350
10 2 Yes 0.75467734
11 3 Yes 0.55299688
12 4 Yes 0.81453028
13 1 No 0.39110782
14 2 No 0.04561982
15 3 No 0.71373404
16 4 No 0.79267332
这是执行你正在寻找的单行代码:
> lapply(split(Pairings,Pairings[,1:2]),function(x)t.test(x$Fold_Change)$p.value)
$`1.Yes`
[1] 0.1768022
$`2.Yes`
[1] 0.01042596
$`3.Yes`
[1] 0.1001815
$`4.Yes`
[1] 0.2458096
$`1.No`
[1] 0.1953704
$`2.No`
[1] 0.4164919
$`3.No`
[1] 0.08582059
$`4.No`
[1] 0.08594222
解释让我们一步一步来做。 第1步:根据Pos和Pair分割数据集:
> split(Pairings,Pairings[,1:2])
$`1.Yes`
Pos Pair Fold_Change
1 1 Yes 0.3107645
9 1 Yes 0.5586135
$`2.Yes`
Pos Pair Fold_Change
2 2 Yes 0.7303547
10 2 Yes 0.7546773
$`3.Yes`
Pos Pair Fold_Change
3 3 Yes 0.4015349
11 3 Yes 0.5529969
$`4.Yes`
Pos Pair Fold_Change
4 4 Yes 0.3436882
12 4 Yes 0.8145303
$`1.No`
Pos Pair Fold_Change
5 1 No 0.7539867
13 1 No 0.3911078
$`2.No`
Pos Pair Fold_Change
6 2 No 0.34578325
14 2 No 0.04561982
$`3.No`
Pos Pair Fold_Change
7 3 No 0.9377195
15 3 No 0.7137340
$`4.No`
Pos Pair Fold_Change
8 4 No 0.6030948
16 4 No 0.7926733
步骤2:在列表上执行t.test:
> lapply(split(Pairings,Pairings[,1:2]),function(x)t.test(x$Fold_Change))
$`1.Yes`
One Sample t-test
data: x$Fold_Change
t = 3.5077, df = 1, p-value = 0.1768
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-1.139921 2.009299
sample estimates:
mean of x
0.434689
$`2.Yes`
One Sample t-test
data: x$Fold_Change
t = 61.0556, df = 1, p-value = 0.01043
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.5879919 0.8970401
sample estimates:
mean of x
0.742516
...
第3步:通过function(x)t.test(x$Fold_Change)$p.value
保留p.value。
如果您想让它更清晰一点,您可以按如下方式修改它:
> do.call(rbind,
lapply(split(Pairings,Pairings[,1:2]),
function(x)data.frame(Pos=x$Pos[1],
Pair=x$Pair[1],
p.value=t.test(x$Fold_Change)$p.value)))
Pos Pair p.value
1.Yes 1 Yes 0.17680222
2.Yes 2 Yes 0.01042596
3.Yes 3 Yes 0.10018152
4.Yes 4 Yes 0.24580956
1.No 1 No 0.19537040
2.No 2 No 0.41649186
3.No 3 No 0.08582059
4.No 4 No 0.08594222