我正在分析mRNA-seq数据集,并且我想使用GOstats Bioconductor软件包对差异表达基因进行GO富集分析。我有一个目标基因组,其中包含75个基因(DE基因)和17596个基因(全部通过mRNAseq测量的基因)。我想使用hyperGTest()函数运行超几何测试。过去,我在手册推荐的org.Mm.eg.db数据库中成功使用了这种方法。现在,不同之处在于所有基因都通过集合的基因ID列出。
首先,我认为这是一个简单的修复程序,因此我试图将注释参数设置为Bioconductor https://www.bioconductor.org/packages/release/data/annotation/上基于ensembl(EnsDb.Mmusculus.v79)而非entrez( org.Mm.eg.db)。我没工作!
library(GOstats)
library(EnsDb.Mmusculus.v79)
targ <- read_excel('INPUT/DESeq2_DE_KO_WT_1.5.xlsx')%>%
pull(ENSG) #read DE genes
univ <- read_excel('INPUT/DESeq2_KO_WT.xlsx')%>%
pull(ENSG) #read universe
#set up parameters
paramsGO <- new(
"GOHyperGParams",
geneIds = targ,
universeGeneIds = univ,
annotation = 'EnsDb.Mmusculus.v79', #annotation argument changed from org.Mm.eg.db
ontology = 'CC', #can be CC, MF or BP
pvalueCutoff = 0.001,
conditional = F,
testDirection = 'over'
)
#run test
Over.GO <- hyperGTest(paramsGO)
Over.GO_summary <- summary(Over.GO)
错误信息:
Loading required package: EnsDb.Mmusculus.v79.db
Error in DatPkgFactory(annotation) :
annotation package 'EnsDb.Mmusculus.v79.db' not available
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called 'EnsDb.Mmusculus.v79.db'
是否可以在我的基因组上使用GOstats进行此测试,而无需将ensembl基因ID转换为entrez基因ID?我应该使用其他数据库而不是EnsDb.Mmusculus.v79.db吗? 使用biomaRt可以轻松做到这一点,但是如果这样做,我会失去很多我想要避免的基因?
session.info()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.6.8 AnnotationFilter_1.6.0
[4] GenomicFeatures_1.34.8 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[7] GO.db_3.7.0 xlsx_0.6.1 readxl_1.3.1
[10] GOstats_2.48.0 graph_1.60.0 Category_2.48.1
[13] Matrix_1.2-15 AnnotationDbi_1.44.0 IRanges_2.16.0
[16] S4Vectors_0.20.1 Biobase_2.42.0 BiocGenerics_0.28.0
[19] biomaRt_2.38.0 forcats_0.4.0 stringr_1.4.0
[22] dplyr_0.8.3 purrr_0.3.2 readr_1.3.1
[25] tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0
[28] tidyverse_1.2.1