如何仅选择重叠区域?

时间:2019-04-29 12:01:39

标签: r dataframe overlap intersect genomicranges

我有一个像g1的表,如果一个边界所涉及的所有区域之间(例如,位于chr10边界的所有区域(所有3个区域)之间存在重叠),我只希望有一个重叠区域)。

gr1 <- 
 makeGRangesFromDataFrame(
data.frame(
  chr = c("1","1","1","1","1","10","10","10","2","2","2"),
  start = c(10,30,35,38,40,15,18,25,20,58,59),
  end = c(20,50,43,49,50,20,25,30,60,70,75)
)

我已经尝试过:

 hits <- findOverlaps(gr1, ignore.strand=TRUE,drop.self=TRUE,drop.redundant=TRUE)
ovpairs <- Pairs(gr1, gr1, hits=hits)
pint <- pintersect(ovpairs, ignore.strand=TRUE)

它有点用,但是如果我可以将重叠区域的数量作为一列,那就完成了!

GRanges object with 11 ranges and 1 metadata column:
   seqnames    ranges strand |       hit
      <Rle> <IRanges>  <count> | <logical>
   [1]        1     35-43      2 |      TRUE
   [2]        1     38-49      2 |      TRUE
   [3]        1     40-50      2 |      TRUE
   [4]        1     38-43      3 |      TRUE
   [5]        1     40-43      4 |      TRUE
   [6]        1     40-49      3 |      TRUE
   [7]       10     18-20      2 |      TRUE
   [8]       10        25      2 |      TRUE
   [9]        2     58-60      2 |      TRUE
  [10]        2     59-60      3 |      TRUE
  [11]        2     59-70      2 |      TRUE
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

0 个答案:

没有答案