我有这四个向量,数字是以秒为单位的时间:
maxtime = 240.0333
mintime = 181.1333
times1 = c(179.1333, 183.8000, 192.3000, 194.0000, 196.2500, 198.8333, 203.4333, 217.8167)
times2 = c(183.1333, 187.8000, 196.3000, 198.0000, 200.2500, 202.8333, 207.4333, 221.8167)
您会注意到times1
和times2
的长度相同。每个相应的元素相隔4秒。也就是说,times1
比times2
提前4秒。
说明我的问题的最好方法是绘制这样的数据:
library(ggplot2)
library(dplyr)
dfplot<- data.frame(ymin=times1, ymax=times2)
data.frame(x=c(rep("min",length(times1)), rep("max",length(times1))),
y=c(times1,times2),
id=1:length(times1)) %>%
ggplot(., aes(id,y,group=id)) +
geom_path(lwd=2) +
coord_flip() +
geom_hline(yintercept = as.numeric(mintime), lty=2,color='red', lwd=1)+
geom_hline(yintercept = as.numeric(maxtime), lty=2,color='red', lwd=1)+
geom_rect(data=dfplot,aes(xmin=0,ymin=ymin,xmax=length(times1),ymax=ymax,fill="red"),alpha=0.2,inherit.aes=FALSE) +
theme(panel.background = element_blank(), plot.background = element_blank())
我想要做的是计算times1
和times2
中每对元素之间的时间间隔所涵盖的时间。它们由黑色水平线和红色矩形表示。如您所见,其中一些可能会重叠。实际上,我想计算两条红色虚线被黑线/红色矩形覆盖的时间比例以及不匹配的比例(即白色间隙)。
我希望这是有道理的。
答案 0 :(得分:2)
以下代码似乎对我有用。
代码的想法是将重叠的片段聚合成更大的片段,计算它们,然后计算它们的长度。
library(dplyr)
library(assertthat)
# Make sure times1 is sorted
assert_that(identical(sort(times1), times1))
# Create segments from specified times
segments <- lapply(seq_along(times1),
function(x) {
assert_that( times1[[x]] < times2[[x]] )
list(c(times1[[x]],times2[[x]]))
})
DASHED_BEGIN <- mintime
DASHED_END <- maxtime
# Cut off the segments based on dashed lines
segments_cut_off <- lapply(segments, function(xx) {
x <- xx[[1]]
# Is it within dashed interval?
if ((x[2] < DASHED_BEGIN) || (x[1] > DASHED_END))
return (NULL) # No
# Yes
(list(c(max(DASHED_BEGIN,x[1]),
min(DASHED_END,x[2]))))
}) %>% Filter(f = Negate(is.null))
# Function for determining the union of two segments
seg_union <- function(xx,yy) {
prev_x <- xx[1:length(xx)-1]
x <- xx[[length(xx)]]
y <- yy[[1]]
# Do they intersect
if (x[[2]]<y[[1]] || y[[2]] < x[[1]]) {
# No. Return each separately as well as
# attach the previous segments
return (c(prev_x,list(x,y)))
}
# Yes. Calculate the union
# (and attach the previous segments too)
(c(prev_x,
list(c(min(x[[1]],y[[1]]),
max(x[[2]],y[[2]])))
))
}
# Create the full list of conglomerated segments
union_lst <- Reduce(f = seg_union, x = segments_cut_off)
union_lst
给了我:
[[1]]
[1] 181.1333 183.1333
[[2]]
[1] 183.8 187.8
[[3]]
[1] 192.3000 202.8333
[[4]]
[1] 203.4333 207.4333
[[5]]
[1] 217.8167 221.8167
现在我们加上他们的长度
vapply(union_lst, function(x) (x[2] - x[1]),
FUN.VALUE = numeric(1)) %>%
sum
答案 1 :(得分:0)
使用BioConductor中的GenomicRanges
库,附上前一个答案:https://stackoverflow.com/a/27576114/496803
由于它只处理整数数据,因此必须将值乘以小数点后的部分。
df <- data.frame(times1=times1*10000, times2=times2*10000, id=1)
total <- data.frame(times1=mintime*10000,times2=maxtime*10000, id=1)
#source("http://bioconductor.org/biocLite.R")
#biocLite("GenomicRanges")
library(GenomicRanges)
dfR <- makeGRangesFromDataFrame(
df, start.field="times1", end.field="times2",
seqnames.field="id"
)
totalR <- makeGRangesFromDataFrame(
total, start.field="times1", end.field="times2",
seqnames.field="id"
)
result <- intersect(totalR, dfR)
result
#GRanges object with 5 ranges and 0 metadata columns:
# seqnames ranges strand
# <Rle> <IRanges> <Rle>
# [1] 1 [1811333, 1831333] *
# [2] 1 [1838000, 1878000] *
# [3] 1 [1923000, 2028333] *
# [4] 1 [2034333, 2074333] *
# [5] 1 [2178167, 2218167] *
sum(round(as.data.frame(result)$width/10000,3))
#[1] 24.533