如何在R bioconductor中检索UCSC refseq基因

时间:2014-01-21 06:36:25

标签: r bioconductor

我正在分析一些ChIP-seq数据,我能够使用基因组浏览器检索与每个切片染色体区域相关的序列元素。在解析并搜索特定主题之后,我最终得到如下输出:

head (chr.reg)
 [,1]                      
 [1,] "chr1:181030981-181032670"
 [2,] "chr3:55709147-55709901"  
 [3,] "chr3:119813410-119814934"
 [4,] "chr4:185201060-185205420"
 [5,] "chr4:39610956-39611545"  
 [6,] "chr6:126253238-126253636"

这些染色体区域中的每一个都含有我感兴趣的转录因子基序。

我的问题如下: 有没有一种方法可以检索与这些区域相关的refseq基因名称?我试着查看bioconductor包,但我找不到任何或者我只是忽略了一个!有人会知道一个特定的包可以帮我解决这个问题吗?

提前致谢:)

1 个答案:

答案 0 :(得分:1)

我相信答案在于ChIPpeakAnno包。 以下是示例代码:

  require(ChIPpeakAnno)
  peak <- RangedData(space="chr4", IRanges(39610956, 39611545))#chromosome start, end
  data (TSS.human.GRCh37)
  ap <- annotatePeakInBatch(peak,Annotation=TSS.human.GRCh37 , PeakLocForDistance="end")

输出如下:

> ap

RangedData with 1 row and 9 value columns across 1 space
                 space               ranges |        peak      strand
              <factor>            <IRanges> | <character> <character>
1 ENSG00000163683        4 [39610956, 39611545] |           1           -
                      feature start_position end_position insideFeature
                  <character>      <numeric>    <numeric>   <character>
1 ENSG00000163683 ENSG00000163683       39552535     39640513        inside
              distancetoFeature shortestDistance fromOverlappingOrNearest
                      <numeric>        <numeric>              <character>
1 ENSG00000163683             28968            28968             NearestStart

检索ENSEMBL ID的refseq或基因符号:

require (org.Hs.eg.db)
gene.anno <- select(org.Hs.eg.db, keys= ap$feature,keytype = "ENSEMBL", columns=c("ENSEMBL",        
"SYMBOL"))

检索到的基因:

> gene.anno
      ENSEMBL     ENTREZID SYMBOL       
1 ENSG00000163683   201895 SMIM14