Question

我对GO分析很新，我对如何做基因列表感到困惑。

我有一个基因列表（n = 10）：

gene_list

    SYMBOL ENTREZID                              GENENAME
1    AFAP1    60312   actin filament associated protein 1
2  ANAPC11    51529 anaphase promoting complex subunit 11
3   ANAPC5    51433  anaphase promoting complex subunit 5
4     ATL2    64225                     atlastin GTPase 2
5    AURKA     6790                       aurora kinase A
6    CCNB2     9133                             cyclin B2
7    CCND2      894                             cyclin D2
8    CDCA2   157313      cell division cycle associated 2
9    CDCA7    83879      cell division cycle associated 7
10  CDCA7L    55536 cell division cycle associated 7-like

我只想找到他们的功能，我被建议使用GO分析工具。我不确定这是否是正确的方法。这是我的解决方案：

x＆lt; - org.Hs.egGO

# Get the entrez gene identifiers that are mapped to a GO ID

    xx<- as.list(x[gene_list$ENTREZID])

所以，我有一个EntrezID列表，分配给每个基因的几个GO术语。例如：

> xx$`60312`
$`GO:0009966`
$`GO:0009966`$GOID
[1] "GO:0009966"

$`GO:0009966`$Evidence
[1] "IEA"

$`GO:0009966`$Ontology
[1] "BP"


$`GO:0051493`
$`GO:0051493`$GOID
[1] "GO:0051493"

$`GO:0051493`$Evidence
[1] "IEA"

$`GO:0051493`$Ontology
[1] "BP"

我的问题是：如何以更简单的方式找到每个基因的功能，我也想知道我做得对吗？因为我想将函数添加到gene_list作为函数/ GO列。

提前致谢，

Answer 1

编辑：有一个新的Bioinformatics SE（目前处于测试版模式）。

我希望我能得到你的目标。

BTW，对于与生物信息学相关的主题，您还可以查看biostar，它与SO具有相同的目的，但对于生物信息学而言

如果您只想获得与基因相关的每个功能的列表，您可以通过ENSEMBl bioconductor包查询数据库biomaRt，这是一个用于查询biomart数据库的API。你需要互联网才能进行查询。

Bioconductor提出了用于生物信息学研究的软件包，这些软件包通常带有良好的插图，可以帮助您完成分析的不同步骤（甚至强调您应该如何设计数据，或者哪些将成为一些陷阱）。 / p>

在您的情况下，直接来自biomaRt vignette - 特别是任务2：

注意：我在下面报告的方式稍微快一点：

# load the library
library("biomaRt")

# I prefer ensembl so that the one I will query, but you can
# query other bases, try out: listMarts() 
ensembl=useMart("ensembl")

# as it seems that you are looking for human genes:
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
# if you want other model organisms have a look at:
#listDatasets(ensembl)

您需要创建查询（您的ENTREZ ID列表）。要查看可以查询的过滤器：

filters = listFilters(ensembl)

然后你想要检索属性：你的GO编号和描述。查看可用属性列表

attributes = listAttributes(ensembl)

对于您来说，查询看起来像是：

goids = getBM(

        #you want entrezgene so you know which is what, the GO ID and
        # name_1006 is actually the identifier of 'Go term name'
        attributes=c('entrezgene','go_id', 'name_1006'), 

        filters='entrezgene', 
        values=gene_list$ENTREZID, 
        mart=ensembl)

查询本身可能需要一段时间。

然后，您可以随时将信息折叠为两列（但我不会将其推荐用于其他任何报告用途）。

Go.collapsed<-Reduce(rbind,lapply(gene_list$ENTREZID,function(x)
                           tempo<-goids[goids$entrezgene==x,]
                           return(
                                   data.frame('ENTREZGENE'= x,
                                  'Go.ID'= paste(tempo$go_id,collapse=' ; '),
                                  'GO.term'=paste(tempo$name_1006,collapse=' ; '))
)

<小时/> 修改

如果要查询ensembl数据库的过去版本：

ens82<-useMart(host='sep2015.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL', dataset='hsapiens_gene_ensembl')

然后查询将是：

goids = getBM(attributes=c('entrezgene','go_id', 'name_1006'), filters='entrezgene',values=gene_list$ENTREZID, mart=ens82)

<小时/> 但是，如果你想进行GO富集分析，你的基因列表就太短了。

基因本体论（GO）分析R中的基因列表（带有ENTREZID）？

1 个答案: