拆分嵌套间隔

时间:2019-03-10 19:33:27

标签: r

有人可以协助使用r拆分重叠或嵌套的时间间隔吗?

我有以下示例:

library (dplyr)

df_foo = read.table(
  textConnection("ID    From  To   Str

SA1    0    100   FOL
SA1    10   20    FOLWK
SA1    15   18    FOLST
SA1    20   50    FOLST
SA1    25   30    FOLWK"), header = TRUE
)

在输出中,间隔之间不应有重叠,并且应如下所示:

   ID From  To   Str
1 SA1 0     10   FOL
2 SA1 10    15   FOLWK
3 SA1 15    18   FOLST
4 SA1 18    20   FOLWK
5 SA1 20    25   FOLST
6 SA1 25    30   FOLWK
7 SA1 30    50   FOLST
8 SA1 50    100  FOL

任何帮助将不胜感激。谢谢

1 个答案:

答案 0 :(得分:1)

您可能不会在此处找到所有有用的步骤,并且可能需要在更大的示例中进行测试:

library(data.table)
dt <- data.table(df_foo)
setkeyv(dt, c("From", "To"))
dt.all <- foverlaps(dt, dt, by.x = c("From", "To"))

dt.all[To > i.To & i.From > From, `:=`(From = i.To)]
dt.all <- unique(dt.all[order(From)], by = "From")
dt.all[, from.next := shift(From, type = "lead")]
dt.all[!is.na(from.next), To := ifelse(To > from.next, from.next, To)]
dt.all[, str.grp := shift(Str, fill = TRUE) != Str]
dt.all[, str.grp.n := cumsum(str.grp)]
dt.all[, from.in.group := shift(From), by = .(Str, str.grp.n)]

dt.all[, to.previous := shift(To)]
dt.all[, from.previous := shift(From)]
dt.all[!is.na(from.in.group) & From == to.previous, `:=`(From = from.previous)]
res <- unique(dt.all[order(From, -To)], by = "From")

希望这为您提供了如何使用数据表的好方法。