这是a question I asked yesterday的后续活动,现已扩展为包括2个以上的输入。我可以找到关于two related的答案,但是没有一个答案能为我提供足够的信息来解决问题。
我想将IRanges列表合并为一个IRanges。这是一个示例输入:
[[1]]
IRanges object with 2 ranges and 1 metadata column:
start end width | on_betalac
<integer> <integer> <integer> | <logical>
[1] 1 21 21 | FALSE
[2] 22 22 1 | TRUE
[[2]]
IRanges object with 2 ranges and 1 metadata column:
start end width | on_other
<integer> <integer> <integer> | <logical>
[1] 1 21 21 | FALSE
[2] 22 22 1 | TRUE
[[3]]
IRanges object with 1 range and 1 metadata column:
start end width | on_pen
<integer> <integer> <integer> | <logical>
[1] 1 22 22 | FALSE
[[4]]
IRanges object with 3 ranges and 1 metadata column:
start end width | on_quin
<integer> <integer> <integer> | <logical>
[1] 1 3 3 | FALSE
[2] 4 13 10 | TRUE
[3] 14 22 9 | FALSE
为便于复制,此列表的dput
位于我的帖子结尾。
我想要的输出是:
IRanges object with 4 ranges and 4 metadata columns:
start end width | on_betalac on_other on_pen on_quin
<integer> <integer> <integer> | <logical> <logical> <logical> <logical>
[1] 1 3 3 | FALSE FALSE FALSE FALSE
[2] 4 13 10 | FALSE FALSE FALSE TRUE
[3] 14 21 8 | FALSE FALSE FALSE FALSE
[4] 22 22 1 | TRUE TRUE FALSE FALSE
您可以看到输出有点像输入的脱节,但是传播了mcol,因此每个输出行都有输入行的mcol,它“上升”了。
这是我的解决方案,可以解决,但是速度很慢。
combine_exposures <- function(exposures) {
cd <- do.call(what = c, args = exposures)
mc <- mcols(cd)
dj <- disjoin(x = cd, with.revmap = TRUE)
r <- mcols(dj)$revmap
d <- as.data.frame(matrix(nrow = length(dj), ncol = ncol(mc)))
names(d) <- names(mc)
for (i in 1:length(dj)) {
d[i,] <- sapply(X = 1:ncol(mc), FUN = function(j) { mc[r[[i]][j], j] })
}
mcols(dj) <- d
return(dj)
}
这是示例输入的内容:
list(new("IRanges", start = c(1L, 22L), width = c(21L, 1L), NAMES = NULL,
elementType = "ANY", elementMetadata = new("DataFrame", rownames = NULL,
nrows = 2L, listData = list(on_betalac = c(FALSE, TRUE
)), elementType = "ANY", elementMetadata = NULL, metadata = list()),
metadata = list()), new("IRanges", start = c(1L, 22L), width = c(21L,
1L), NAMES = NULL, elementType = "ANY", elementMetadata = new("DataFrame",
rownames = NULL, nrows = 2L, listData = list(on_other = c(FALSE,
TRUE)), elementType = "ANY", elementMetadata = NULL, metadata = list()),
metadata = list()), new("IRanges", start = 1L, width = 22L,
NAMES = NULL, elementType = "ANY", elementMetadata = new("DataFrame",
rownames = NULL, nrows = 1L, listData = list(on_pen = FALSE),
elementType = "ANY", elementMetadata = NULL, metadata = list()),
metadata = list()), new("IRanges", start = c(1L, 4L, 14L),
width = c(3L, 10L, 9L), NAMES = NULL, elementType = "ANY",
elementMetadata = new("DataFrame", rownames = NULL, nrows = 3L,
listData = list(on_quin = c(FALSE, TRUE, FALSE)), elementType = "ANY",
elementMetadata = NULL, metadata = list()), metadata = list()))
答案 0 :(得分:0)
我想出了一个更有效的版本,但仍怀疑它可能会更快。
new_combine <- function(exposures) {
cd <- do.call(what = c, args = exposures)
mc <- mcols(cd)
dj <- disjoin(x = cd, with.revmap = TRUE)
r <- mcols(dj)$revmap
m <- as.matrix(mc)[cbind(unlist(r),
rep(1:length(dj), times = ncol(mc)))]
mcols(dj) <- setNames(as.data.frame(matrix(m, nrow = length(dj), byrow = TRUE)),
nm = names(mc))
return(dj)
}
我运行了Bench :: mark并发现此版本的速度快了大约3倍。这对于我的应用程序可能已经足够好了,但是我感觉到我没有正确使用IRanges。
expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time
<chr> <bch:> <bch:> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int> <bch:tm>
1 old 77.9ms 83.9ms 81.3ms 138.1ms 11.9 35.6KB 74 40 3.36s
2 new 27.6ms 29.1ms 28.9ms 34.2ms 34.4 10.6KB 73 252 7.32s