R找到多个时间段内的重叠

时间:2017-06-21 00:56:45

标签: r

我检查了我的家庭工作,看到很多样本的间隔重叠在两个ds,但我有单个ds有超过3个观察,了解到foverlaps但它不起作用,因为它想要2 ds A和B,在我的情况下,我的所有日​​期都在单个df中,我可以用foverlaps这样做吗? 在大规模上我将需要找到每个custID的重叠,但是现在我想让它以简单的方式工作

ds <- as.Date(c('2014-9-1', '2015-5-11', '2016-11-1','2015-1-1','2015-10-1'))  # start dd
de <- as.Date(c('2015-9-30', '2016-10-31', '2030-1-1','2015-5-30','2015-12-31')) # end dd
id <- c(1,2,3,1,2)
prodid <- c('20','30','20','20','20')
custid <- c(123,123,123,4444,4444)

    df <- data.frame(custid, ds,de,id,prodid)
    df
    # find if any ovelap exists for interval between ds and de:
    ovl <- foverlaps(data.table(df), ????????, type='within')  # just sample not working
   custid         ds         de id prodid
1    123 2014-09-01 2015-09-30  1     20   \ overlap here
2    123 2015-05-11 2016-10-31  2     30   / overlap
3    123 2016-11-01 2030-01-01  3     20
4   4444 2015-01-01 2015-05-30  1     20
5   4444 2015-10-01 2015-12-31  2     20

1 个答案:

答案 0 :(得分:1)

library(data.table)

ds <- as.Date(c('2014-9-1', '2015-5-11', '2016-11-1','2015-1-1','2015-10-1'))  # start dd
de <- as.Date(c('2015-9-30', '2016-10-31', '2030-1-1','2015-5-30','2015-12-31')) # end dd
id <- c(1,2,3,1,2)
prodid <- c('20','30','20','20','20')
custid <- c(123,123,123,4444,4444)

df <- data.frame(custid, ds,de,id,prodid)
df <- data.table(df)
setkey(df, ds, de)

ovl <- foverlaps(df, df, type = "within")
ovl[custid == i.custid & id != i.id]

我所做的是设置密钥,这是foverlaps工作所必需的。 然后我过滤输出,您只对custid == i.custid AND与其本身so id != i.id时的重叠感兴趣。

> ovl[custid == i.custid & id != i.id]
   custid         ds         de id prodid i.custid       i.ds       i.de i.id i.prodid
1:    123 2015-05-11 2016-10-31  2     30      123 2014-09-01 2015-09-30    1       20
2:    123 2014-09-01 2015-09-30  1     20      123 2015-05-11 2016-10-31    2       30

这显示了两种组合中感兴趣的重叠。