组合IRanges对象并维护mcol

时间:2019-04-08 22:51:29

标签: r iranges

我将从一个示例开始,然后描述我要使用的逻辑。

我有两个普通的IRanges对象,它们跨越了相同的总范围,但可能在不同数量的范围内。每个IRanges都有一个mcol,但是mcolIRanges之间是不同的。

a
#IRanges object with 1 range and 1 metadata column:
#          start       end     width | on_betalac
#      <integer> <integer> <integer> |  <logical>
#  [1]         1       167       167 |      FALSE
b
#IRanges object with 3 ranges and 1 metadata column:
#          start       end     width |  on_other
#      <integer> <integer> <integer> | <logical>
#  [1]         1       107       107 |     FALSE
#  [2]       108       112         5 |      TRUE
#  [3]       113       167        55 |     FALSE

您可以看到这两个IRanges范围从1到167,但是a有一个范围,而b有三个范围。我想将它们结合起来以得到如下输出:

my_great_function(a, b)
#IRanges object with 3 ranges and 2 metadata columns:
#          start       end     width | on_betalac  on_other
#      <integer> <integer> <integer> |  <logical> <logical>
#  [1]         1       107       107 |     FALSE     FALSE
#  [2]       108       112         5 |     FALSE      TRUE
#  [3]       113       167        55 |     FALSE     FALSE

输出类似于输入的disjoin,但保留mcols,甚至扩展它们,以使输出范围的mcol与输入的值相同。导致它的输入范围。

2 个答案:

答案 0 :(得分:1)

选项1:使用IRanges::findOverlaps

m <- findOverlaps(b, a)
c <- b[queryHits(m)]
mcols(c) <- cbind(mcols(c), mcols(a[subjectHits(m)]))
#IRanges object with 3 ranges and 2 metadata columns:
#          start       end     width |  on_other on_betacalc
#      <integer> <integer> <integer> | <logical>   <logical>
#  [1]         1       107       107 |     FALSE       FALSE
#  [2]       108       112         5 |      TRUE       FALSE
#  [3]       113       167        55 |     FALSE       FALSE

结果对象c是具有两个元数据列的IRanges对象。

选项2:使用IRanges::mergeByOverlaps

c <- mergeByOverlaps(b, a)
c
#DataFrame with 3 rows and 4 columns
#          b  on_other         a on_betacalc
#  <IRanges> <logical> <IRanges>   <logical>
#1     1-107     FALSE     1-167       FALSE
#2   108-112      TRUE     1-167       FALSE
#3   113-167     FALSE     1-167       FALSE

结果输出对象是DataFrame列,其中IRanges列和原始元数据列为附加列。

选项3:使用data.table::foverlaps

library(data.table)
a.dt <- as.data.table(cbind.data.frame(a, mcols(a)))[, width := NULL]
b.dt <- as.data.table(cbind.data.frame(b, mcols(b)))[, width := NULL]

setkey(b.dt, start, end)
foverlaps(a.dt, b.dt, type = "any")[, `:=`(i.start = NULL, i.end = NULL)][]
   start end on_other on_betacalc
1:     1 107    FALSE       FALSE
2:   108 112     TRUE       FALSE
3:   113 167    FALSE       FALSE

结果对象是data.table

选项4:使用fuzzyjoin::interval_left_join

library(fuzzyjoin)
a.df <- cbind.data.frame(a, mcols(a))
b.df <- cbind.data.frame(b, mcols(b))
interval_left_join(b.df, a.df, by = c("start", "end"))
#  start.x end.x width.x on_other start.y end.y width.y on_betacalc
#1       1   107     107    FALSE       1   167     167       FALSE
#2     108   112       5     TRUE       1   167     167       FALSE
#3     113   167      55    FALSE       1   167     167       FALSE

结果对象是data.frame


样本数据

library(IRanges)
a <- IRanges(1, 167)
mcols(a)$on_betacalc = F

b <- IRanges(c(1, 108, 113), c(107, 112, 167))
mcols(b)$on_other <- c(F, T, F)

答案 1 :(得分:0)

这就是我能够想到的。不如MauritsEvers优雅,但在某些方面可能对其他人有用。

combine_exposures <- function(...) {

  cd <- c(...)
  mc <- mcols(cd)
  dj <- disjoin(x = cd, with.revmap = TRUE)
  r <- mcols(dj)$revmap

  d <- as.data.frame(matrix(nrow = length(dj), ncol = ncol(mc)))
  names(d) <- names(mc)

  for (i in 1:length(dj)) {
    d[i,] <- sapply(X = 1:ncol(mc), FUN = function(j) { mc[r[[i]][j], j] })
  }

  mcols(dj) <- d
  return(dj)
}

这是dput(c(e1, e2, e3, e4))(e1,e2,e3和e4是一些示例IRanges,它们都跨越1,167):

new("IRanges", start = c(1L, 1L, 108L, 113L, 1L, 1L), width = c(167L, 
107L, 5L, 55L, 167L, 167L), NAMES = NULL, elementType = "ANY", 
    elementMetadata = new("DataFrame", rownames = NULL, nrows = 6L, 
        listData = list(on_betalac = c(FALSE, NA, NA, NA, NA, 
        NA), on_other = c(NA, FALSE, TRUE, FALSE, NA, NA), on_pen = c(NA, 
        NA, NA, NA, FALSE, NA), on_quin = c(NA, NA, NA, NA, NA, 
        FALSE)), elementType = "ANY", elementMetadata = NULL, 
        metadata = list()), metadata = list())