有人可以协助使用r拆分重叠或嵌套的时间间隔吗?
我有以下示例:
library (dplyr)
df_foo = read.table(
textConnection("ID From To Str
SA1 0 100 FOL
SA1 10 20 FOLWK
SA1 15 18 FOLST
SA1 20 50 FOLST
SA1 25 30 FOLWK"), header = TRUE
)
在输出中,间隔之间不应有重叠,并且应如下所示:
ID From To Str
1 SA1 0 10 FOL
2 SA1 10 15 FOLWK
3 SA1 15 18 FOLST
4 SA1 18 20 FOLWK
5 SA1 20 25 FOLST
6 SA1 25 30 FOLWK
7 SA1 30 50 FOLST
8 SA1 50 100 FOL
任何帮助将不胜感激。谢谢
答案 0 :(得分:1)
您可能不会在此处找到所有有用的步骤,并且可能需要在更大的示例中进行测试:
library(data.table)
dt <- data.table(df_foo)
setkeyv(dt, c("From", "To"))
dt.all <- foverlaps(dt, dt, by.x = c("From", "To"))
dt.all[To > i.To & i.From > From, `:=`(From = i.To)]
dt.all <- unique(dt.all[order(From)], by = "From")
dt.all[, from.next := shift(From, type = "lead")]
dt.all[!is.na(from.next), To := ifelse(To > from.next, from.next, To)]
dt.all[, str.grp := shift(Str, fill = TRUE) != Str]
dt.all[, str.grp.n := cumsum(str.grp)]
dt.all[, from.in.group := shift(From), by = .(Str, str.grp.n)]
dt.all[, to.previous := shift(To)]
dt.all[, from.previous := shift(From)]
dt.all[!is.na(from.in.group) & From == to.previous, `:=`(From = from.previous)]
res <- unique(dt.all[order(From, -To)], by = "From")
希望这为您提供了如何使用数据表的好方法。