使用biomaRt检索多个数据集中的基因注释

时间:2015-04-20 16:54:33

标签: r

我有一个基因列表(Entrenz_IDs:link https://www.dropbox.com/s/phjxkutm3xv2fi6/Ids.csv?dl=0)。我希望在许多数据集中对这些基因进行一些注释(例如人类,斑马鱼等)。我正在尝试使用biomaRt。

我尝试的代码是;

library(biomaRt)

ensembl=useMart("ensembl")
datasets <- listDatasets(ensembl)

genes <- read.csv(file = file.choose(), header = TRUE, sep = ",")

GOannotations <- list()

for (i in 1:nrow(datasets)) {
       for (j in (genes)) {
            values <- genes$genes[j]
            GOannotations[[i]] <- getBM(attributes = c("ensembl_gene_id", "name_1006", "peptide"),
                          filters = "ensembl_gene_id",
                          values = values,
                          mart = useMart(biomart = "ensembl", dataset = datasets$dataset[i])
           }
     }

我得到的错误是

Error in checkAtAssignment("Mart", "dataset", "AsIs") : 
assignment of an object of class “AsIs” is not valid for @‘dataset’ in an object of class “Mart”; is(value, "character") is not TRUE

我做错了吗?

我还有其他办法吗?

1 个答案:

答案 0 :(得分:0)

我像这样设置了biomaRt

library(biomaRt)
datasets <- listDatasets(useMart("ensembl"))

然后读入数据,确保字符串不被解释为因子

## file <- file.choose()
file <- "~/Downloads/Ids.csv"
genes <- read.csv(file = file, header = TRUE, sep = ",", stringsAsFactors=FALSE)

然后我编写了一个函数,它将采用一个数据集,感兴趣的基因和市场,并查询biomart。我将getBM()放在tryCatch()内,这意味着它可能会失败,但函数仍会返回一个值(NULL)。

fun <- function(dataset, values, ensembl) 
{
    stopifnot(is.character(dataset), is.character(genes))
    message(dataset)
    tryCatch({
        getBM(attributes = c("ensembl_gene_id", "name_1006", "peptide"),
              filters = "ensembl_gene_id", values = values,
              mart = useMart(biomart = "ensembl", dataset = dataset))
    }, error=function(err) {
        message("data set '", dataset, "' failed: ", conditionMessage(err))
        NULL
    })
}

我测试了我的功能,并且它“有效”,在某种意义上,查询了biomart,并返回了结果。

> fun("hsapiens_gene_ensembl", genes$genes, ensembl)
hsapiens_gene_ensembl
[1] ensembl_gene_id name_1006       peptide        
<0 rows> (or 0-length row.names)

显然它在返回有意义的结果方面不起作用,但这是一个不同问题的主题(genes$genes ensembl gene id,如filters=参数中所公布的那样?)。< / p>

为了处理许多数据集,我写了

GOannotations <-
    lapply(as.character(datasets$dataset), fun, genes$genes, ensembl)