任何方式强迫"列出"到S4"列表"?

时间:2016-07-06 05:58:08

标签: r data-manipulation s4 bioconductor coercion

有没有办法强制简单的类似列表的对象到S4" List"对象?我需要对我的数据进行一些矢量化。显然,我在我的函数上使用了嵌套式lapply,并且我将其返回类型检查为" list"。我想" List"喜欢的对象。我怎样才能做到这一点?感谢。

以下是澄清问题的可重复示例:

数据

    foo <- GRanges(
      seqnames=Rle(c("chr1", "chr2", "chr3", "chr4"), c(3, 2, 1, 2)),
      ranges=IRanges(seq(1, by=9, len=8), seq(7, by=9, len=8)),
      rangeName=letters[seq(1:8)], score=sample(1:20, 8, replace = FALSE))

    bar <- GRanges(
      seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(4, 3, 1, 1)),
      ranges=IRanges(seq(2, by=5, len=9), seq(4, by=5, len=9)),
      rangeName=letters[seq(1:9)], score=sample(1:20, 9, replace = FALSE))

    moo <- GRanges(
      seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(3, 4, 2,1)),
      ranges=IRanges(seq(5, by=7, len=10), seq(8, by=7, len=10)),
      rangeName=letters[seq(1:10)], score=sample(1:20, 10, replace = FALSE))

重叠命中索引

    grl <- GRangesList(bar, moo)
    res <- lapply(grl, function(ele_) {
        tmp <- as(findOverlaps(foo, ele_), "List")
      })

重复区域的说明(第一个列表元素对应于条形码):

[[1]]
IntegerList of length 8
[[1]] 1 2    # 1st regions from foo overlapped with 1st,2nd regions from bar
[[2]] 3
[[3]] 4
[[4]] 6 7    # 1st regions from foo overlapped with 6st,7th regions from bar 

目标只保留一个(a.k.a,删除多个相交的区域),例如:

[[1]]
IntegerList of length 8
[[1]] 2   # only keep 2nd region from bar
[[2]] 3
[[3]] 4
[[4]] 6 7 # only keep 6th region from bar

删除重复的区域

obj.ov <- lapply(res, function(ele_) {
  re <- lapply(grl, function(obj) {
    id0 <- as(which.max(extractList(obj$score, ele_)), "List")
    id0 <- id0[!is.na(id0)]
  })
  re <- re[!duplicated(re)]
})

进一步的步骤

as.obj.ov <- as(obj.ov, "List")#如果此强制措施不正确,则无法像 obj.ov 那样展开

然后, as.obj.ov 必须像 obj.ov 一样可扩展为命中索引向量,同时类型必须为S4&#34; List&#34;宾语。

我需要将 obj.ov 作为 S4&#34;列表&#34; 对象。在R?中可以做这样的强制吗?

任何可能的方法,解决方案或想法都值得赞赏。

1 个答案:

答案 0 :(得分:2)

我们可以使用select = "first"获得第一场比赛。

lapply(grl, function(ele_) {
  ix <- findOverlaps(foo, ele_, select = "first")
  ele_[ix[!is.na(ix)]]
})

[[1]]
GRanges object with 4 ranges and 2 metadata columns:
      seqnames    ranges strand |   rangeName     score
         <Rle> <IRanges>  <Rle> | <character> <integer>
  [1]     chr1  [ 2,  4]      * |           a        18
  [2]     chr1  [12, 14]      * |           c         2
  [3]     chr1  [17, 19]      * |           d        19
  [4]     chr2  [27, 29]      * |           f        15
  -------
  seqinfo: 4 sequences from an unspecified genome; no seqlengths

[[2]]
GRanges object with 6 ranges and 2 metadata columns:
      seqnames    ranges strand |   rangeName     score
         <Rle> <IRanges>  <Rle> | <character> <integer>
  [1]     chr1  [ 5,  8]      * |           a        11
  [2]     chr1  [12, 15]      * |           b        13
  [3]     chr1  [19, 22]      * |           c        14
  [4]     chr2  [26, 29]      * |           d        20
  [5]     chr2  [40, 43]      * |           f         8
  [6]     chr4  [68, 71]      * |           j         1
  -------
  seqinfo: 4 sequences from an unspecified genome; no seqlength