GenomicFeatures:在映射到成绩单时跟踪读数和分数

时间:2015-12-03 10:04:05

标签: r mapping bioconductor

我有一张床文件,里面有基因组坐标和分数。我想将这些坐标映射到人类外显子并跟踪分数。

这是一个例子(我的床文件只存储在GRanges对象中的三行):

   library(GenomicRanges)

   reads<-GRanges(
    seqnames = Rle(rep("chr1", 3)),
    ranges = IRanges(c(3044402,3044562,3044827),c(3044402,3044562,3044827)),
    strand = Rle(rep("*",3)),
    score = c(0.111111,-0.101128,-0.25)
    )
   reads

   GRanges object with 3 ranges and 1 metadata column:
         seqnames             ranges strand |     score
            <Rle>          <IRanges>  <Rle> | <numeric>
     [1]     chr1 [3044402, 3044402]      * |  0.111111
     [2]     chr1 [3044562, 3044562]      * | -0.101128
     [3]     chr1 [3044827, 3044827]      * |     -0.25
     -------
     seqinfo: 1 sequence from an unspecified genome; no seqlengths

加载必要的库后......

   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
   library(BSgenome.Hsapiens.UCSC.hg19)

   txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
   exon_by_tx<-exonsBy(txdb, by="tx", use.names=TRUE)

...我可以将这些基因组坐标映射到这样的外显子:

   library(GenomicFeatures)

   mapped_exon <- mapToTranscripts(reads, exon_by_tx,ignore.strand=FALSE)
   mapped_exon

   GRanges object with 1 range and 2 metadata columns:
           seqnames    ranges strand |     xHits transcriptsHits
              <Rle> <IRanges>  <Rle> | <integer>       <integer>
     [1] uc021oez.1  [24, 24]      + |         2             158

我知道uc021oez.1(NR_036215)的基因组坐标是chr1:3044539-3044599。这意味着chr1 [3044562, 3044562]映射到uc021oez.1 [24, 24],其得分为-0.101128。

如何跟踪这些信息?换句话说,如何使用相应的mapped_exon条目自动将额外列添加到reads

1 个答案:

答案 0 :(得分:0)

行。我找到了解决方案! :)

mcols(mapped_exon)<-cbind(mcols(mapped_exon),DataFrame(reads[mapped_exon$xHits]))

mapped_exon
GRanges object with 1 range and 4 metadata columns:
        seqnames    ranges strand |     xHits transcriptsHits                         X     score
           <Rle> <IRanges>  <Rle> | <integer>       <integer>                 <GRanges> <numeric>
  [1] uc021oez.1  [24, 24]      + |         2             158 chr1:*:[3044562, 3044562] -0.101128
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths