我的数据集包括3个时间点记录的基因表达值 我正在尝试使用tukey校正的anova测试来寻找跨时间点的基因的差异表达。所以对于每个基因我想要比较如下: 基因时间点1对2 基因时间点2对3 基因时间点3对1
我的数据采用以下格式:
> head(rf)
gene expn timepoint rep
2 EG620009 // EG620009 8.428851 x0hr 0
3 LYPLA1 10.386500 x0hr 0
21 EG620009 // EG620009 8.582346 x0hr 1
31 LYPLA1 10.379710 x0hr 1
22 EG620009 // EG620009 8.566248 x0hr 2
32 LYPLA1 10.399080 x0hr 2
> tail(rf)
gene expn timepoint rep
23 EG620009 // EG620009 8.561409 x24hr 0
33 LYPLA1 10.233400 x24hr 0
24 EG620009 // EG620009 8.750639 x24hr 1
34 LYPLA1 10.023780 x24hr 1
25 EG620009 // EG620009 8.560267 x24hr 2
35 LYPLA1 10.025980 x24hr 2
如果我这样做:
TukeyHSD(aov(rf$expn ~ rf$timepoint * rf$gene))
这将让我对所有基因的每个时间点进行比较 即。包括比较如 基因时间点1对比基因b时间点2
我一直在努力研究如何将aov函数应用于基因的数据子集。我已经定义了一个函数,它将p值作为输出,并尝试使用by函数单独将其应用于每个基因;
> gene.aov = function(x) {TukeyHSD(aov(expn ~ timepoint, data = x))}
> aov.pval = function(y) {y$timepoint[,4]}
> gene.pval = function(z) {aov.pval(gene.aov(z))}
> pvals = by(rf$expn,list(rf$gene),gene.pval)
> Error in eval(predvars, data, env) :
numeric 'envir' arg not of length one
任何暗示为什么这不起作用?或者我应该以完全不同的方式处理这个问题? 谢谢!
答案 0 :(得分:1)
它无效,因为by
期望它的第一个参数是data.frame或matrix,你传递的是rf$exp
numeric
向量。你可以这样做,它会工作正常(为了便于阅读,我放弃了多个功能)。
by(rf, rf$gene, function(x) {TukeyHSD(aov(expn ~ timepoint, data = x))}, simplify = F)
rf$gene: EG620009
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = expn ~ timepoint, data = x)
$timepoint
diff lwr upr p adj
x24hr-x0hr 0.09829 -0.123391 0.319971 0.2857424
---------------------------------------------------------------------------
rf$gene: LYPLA1
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = expn ~ timepoint, data = x)
$timepoint
diff lwr upr p adj
x24hr-x0hr -0.2940433 -0.4876756 -0.100411 0.0135193