我有一张床文件,里面有基因组坐标和分数。我想将这些坐标映射到人类外显子并跟踪分数。
这是一个例子(我的床文件只存储在GRanges对象中的三行):
library(GenomicRanges)
reads<-GRanges(
seqnames = Rle(rep("chr1", 3)),
ranges = IRanges(c(3044402,3044562,3044827),c(3044402,3044562,3044827)),
strand = Rle(rep("*",3)),
score = c(0.111111,-0.101128,-0.25)
)
reads
GRanges object with 3 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [3044402, 3044402] * | 0.111111
[2] chr1 [3044562, 3044562] * | -0.101128
[3] chr1 [3044827, 3044827] * | -0.25
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
加载必要的库后......
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(BSgenome.Hsapiens.UCSC.hg19)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
exon_by_tx<-exonsBy(txdb, by="tx", use.names=TRUE)
...我可以将这些基因组坐标映射到这样的外显子:
library(GenomicFeatures)
mapped_exon <- mapToTranscripts(reads, exon_by_tx,ignore.strand=FALSE)
mapped_exon
GRanges object with 1 range and 2 metadata columns:
seqnames ranges strand | xHits transcriptsHits
<Rle> <IRanges> <Rle> | <integer> <integer>
[1] uc021oez.1 [24, 24] + | 2 158
我知道uc021oez.1(NR_036215)的基因组坐标是chr1:3044539-3044599。这意味着chr1 [3044562, 3044562]
映射到uc021oez.1 [24, 24]
,其得分为-0.101128。
如何跟踪这些信息?换句话说,如何使用相应的mapped_exon
条目自动将额外列添加到reads
?
答案 0 :(得分:0)
行。我找到了解决方案! :)
mcols(mapped_exon)<-cbind(mcols(mapped_exon),DataFrame(reads[mapped_exon$xHits]))
mapped_exon
GRanges object with 1 range and 4 metadata columns:
seqnames ranges strand | xHits transcriptsHits X score
<Rle> <IRanges> <Rle> | <integer> <integer> <GRanges> <numeric>
[1] uc021oez.1 [24, 24] + | 2 158 chr1:*:[3044562, 3044562] -0.101128
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths