我有一个data.frame
,其中包含每个id
的线性间隔:
df <- data.frame(id = c(rep("a",3),rep("b",4),rep("d",4)),
start = c(3,4,10,5,6,9,12,8,12,15,27),
end = c(7,8,12,8,9,13,13,10,15,26,30))
我正在寻找一种有效的功能,它将统一每个id
的所有交叉间隔。对于df
,结果可能是:
res.df <- data.frame(id = c("a","a","b","d","d","d"),
start = c(3,10,5,8,12,27),
end = c(8,12,13,10,26,30))
最终,我能够总结每个id
的所有联合间隔,以获得它们的总长度:
sapply(unique(res.df$id), function(x) sum(res.df$end[which(res.df$id == x)]-res.df$start[which(res.df$id == x)]+1))
答案 0 :(得分:3)
#source("https://bioconductor.org/biocLite.R")
#biocLite("IRanges")
library(IRanges)
df1 <- as(df, "RangedData")
as.data.frame(reduce(df1, by = "id", min.gapwidth = 0.5))
# space start end width id
#1 1 3 8 6 a
#2 1 10 12 3 a
#3 1 5 13 9 b
#4 1 8 10 3 d
#5 1 12 26 15 d
#6 1 27 30 4 d