我正在分析一些ChIP-seq数据,我能够使用基因组浏览器检索与每个切片染色体区域相关的序列元素。在解析并搜索特定主题之后,我最终得到如下输出:
head (chr.reg)
[,1]
[1,] "chr1:181030981-181032670"
[2,] "chr3:55709147-55709901"
[3,] "chr3:119813410-119814934"
[4,] "chr4:185201060-185205420"
[5,] "chr4:39610956-39611545"
[6,] "chr6:126253238-126253636"
这些染色体区域中的每一个都含有我感兴趣的转录因子基序。
我的问题如下: 有没有一种方法可以检索与这些区域相关的refseq基因名称?我试着查看bioconductor包,但我找不到任何或者我只是忽略了一个!有人会知道一个特定的包可以帮我解决这个问题吗?
提前致谢:)
答案 0 :(得分:1)
我相信答案在于ChIPpeakAnno
包。
以下是示例代码:
require(ChIPpeakAnno)
peak <- RangedData(space="chr4", IRanges(39610956, 39611545))#chromosome start, end
data (TSS.human.GRCh37)
ap <- annotatePeakInBatch(peak,Annotation=TSS.human.GRCh37 , PeakLocForDistance="end")
输出如下:
> ap
RangedData with 1 row and 9 value columns across 1 space
space ranges | peak strand
<factor> <IRanges> | <character> <character>
1 ENSG00000163683 4 [39610956, 39611545] | 1 -
feature start_position end_position insideFeature
<character> <numeric> <numeric> <character>
1 ENSG00000163683 ENSG00000163683 39552535 39640513 inside
distancetoFeature shortestDistance fromOverlappingOrNearest
<numeric> <numeric> <character>
1 ENSG00000163683 28968 28968 NearestStart
检索ENSEMBL ID的refseq或基因符号:
require (org.Hs.eg.db)
gene.anno <- select(org.Hs.eg.db, keys= ap$feature,keytype = "ENSEMBL", columns=c("ENSEMBL",
"SYMBOL"))
检索到的基因:
> gene.anno
ENSEMBL ENTREZID SYMBOL
1 ENSG00000163683 201895 SMIM14