我有一个普遍的问题,我认为归结为某种范围问题。
下面是一个使用biomaRt的getSequence()函数的公式片段。用户输入自定义函数(1)基因名称,并可选择(2)导入上游的碱基对数。
# Load libraries
library(biomaRt)
# Let's make a custom "getSequence" function
getUpstream <- function(x, bp.upstream = 50){
bp.upstream <- bp.upstream
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
upstream.master <- NULL
for(i in x){
upstream.i <- getSequence(id = i,
type = "hgnc_symbol",
seqType = "coding_gene_flank",
upstream = bp.upstream,
mart = ensembl
)
upstream.master <- rbind(upstream.master, upstream.i)
}
return(upstream.master)
}
假设我使用此函数运行搜索而未指定上游的碱基对数,例如:
getUpstream("NOTCH4")
出乎意料的是,如果没有该行,该功能将无效:
bp.upstream <- bp.upstream
print(bb.upstream)等其他行也会使代码工作。
我认为在调用函数时会定义bp.upstream,因此一旦调用getSequence就会设置upstream = 50。谁能帮助我理解为什么不呢?
答案 0 :(得分:1)
这是避免范围问题的解决方法。
# Load libraries
library(biomaRt)
# Let's make a custom "getSequence" function
getUpstream <- function(x, bp.upstream = 50){
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
upstream.master <- lapply(x, function(i,stream)
getSequence(id = i,
type = "hgnc_symbol",
seqType = "coding_gene_flank",
upstream = stream,
mart = ensembl),stream=bp.upstream)
upstream.master
}