数据子集内的ANOVA

时间:2016-12-07 12:52:26

标签: r statistics subset anova

我的数据集包括3个时间点记录的基因表达值 我正在尝试使用tukey校正的anova测试来寻找跨时间点的基因的差异表达。所以对于每个基因我想要比较如下:     基因时间点1对2     基因时间点2对3     基因时间点3对1

我的数据采用以下格式:

    > head(rf)
                gene      expn timepoint rep
2  EG620009  // EG620009  8.428851      x0hr   0
3                 LYPLA1 10.386500      x0hr   0
21 EG620009  // EG620009  8.582346      x0hr   1
31                LYPLA1 10.379710      x0hr   1
22 EG620009  // EG620009  8.566248      x0hr   2
32                LYPLA1 10.399080      x0hr   2
    > tail(rf)
                gene      expn timepoint rep
23 EG620009  // EG620009  8.561409     x24hr   0
33                LYPLA1 10.233400     x24hr   0
24 EG620009  // EG620009  8.750639     x24hr   1
34                LYPLA1 10.023780     x24hr   1
25 EG620009  // EG620009  8.560267     x24hr   2
35                LYPLA1 10.025980     x24hr   2

如果我这样做:

TukeyHSD(aov(rf$expn ~ rf$timepoint * rf$gene))

这将让我对所有基因的每个时间点进行比较 即。包括比较如 基因时间点1对比基因b时间点2

我一直在努力研究如何将aov函数应用于基因的数据子集。我已经定义了一个函数,它将p值作为输出,并尝试使用by函数单独将其应用于每个基因;

> gene.aov = function(x) {TukeyHSD(aov(expn ~ timepoint, data = x))}
> aov.pval = function(y) {y$timepoint[,4]}
> gene.pval = function(z) {aov.pval(gene.aov(z))}
> pvals = by(rf$expn,list(rf$gene),gene.pval)
> Error in eval(predvars, data, env) : 
   numeric 'envir' arg not of length one 

任何暗示为什么这不起作用?或者我应该以完全不同的方式处理这个问题? 谢谢!

1 个答案:

答案 0 :(得分:1)

它无效,因为by期望它的第一个参数是data.frame或matrix,你传递的是rf$exp numeric向量。你可以这样做,它会工作正常(为了便于阅读,我放弃了多个功能)。

by(rf, rf$gene, function(x) {TukeyHSD(aov(expn ~ timepoint, data = x))}, simplify = F)

rf$gene: EG620009
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = expn ~ timepoint, data = x)

$timepoint
              diff       lwr      upr     p adj
x24hr-x0hr 0.09829 -0.123391 0.319971 0.2857424

--------------------------------------------------------------------------- 
rf$gene: LYPLA1
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = expn ~ timepoint, data = x)

$timepoint
                 diff        lwr       upr     p adj
x24hr-x0hr -0.2940433 -0.4876756 -0.100411 0.0135193