我检查了我的家庭工作,看到很多样本的间隔重叠在两个ds,但我有单个ds有超过3个观察,了解到foverlaps但它不起作用,因为它想要2 ds A和B,在我的情况下,我的所有日期都在单个df中,我可以用foverlaps这样做吗? 在大规模上我将需要找到每个custID的重叠,但是现在我想让它以简单的方式工作
ds <- as.Date(c('2014-9-1', '2015-5-11', '2016-11-1','2015-1-1','2015-10-1')) # start dd
de <- as.Date(c('2015-9-30', '2016-10-31', '2030-1-1','2015-5-30','2015-12-31')) # end dd
id <- c(1,2,3,1,2)
prodid <- c('20','30','20','20','20')
custid <- c(123,123,123,4444,4444)
df <- data.frame(custid, ds,de,id,prodid)
df
# find if any ovelap exists for interval between ds and de:
ovl <- foverlaps(data.table(df), ????????, type='within') # just sample not working
custid ds de id prodid
1 123 2014-09-01 2015-09-30 1 20 \ overlap here
2 123 2015-05-11 2016-10-31 2 30 / overlap
3 123 2016-11-01 2030-01-01 3 20
4 4444 2015-01-01 2015-05-30 1 20
5 4444 2015-10-01 2015-12-31 2 20
答案 0 :(得分:1)
library(data.table)
ds <- as.Date(c('2014-9-1', '2015-5-11', '2016-11-1','2015-1-1','2015-10-1')) # start dd
de <- as.Date(c('2015-9-30', '2016-10-31', '2030-1-1','2015-5-30','2015-12-31')) # end dd
id <- c(1,2,3,1,2)
prodid <- c('20','30','20','20','20')
custid <- c(123,123,123,4444,4444)
df <- data.frame(custid, ds,de,id,prodid)
df <- data.table(df)
setkey(df, ds, de)
ovl <- foverlaps(df, df, type = "within")
ovl[custid == i.custid & id != i.id]
我所做的是设置密钥,这是foverlaps
工作所必需的。
然后我过滤输出,您只对custid == i.custid
AND与其本身so id != i.id
时的重叠感兴趣。
> ovl[custid == i.custid & id != i.id]
custid ds de id prodid i.custid i.ds i.de i.id i.prodid
1: 123 2015-05-11 2016-10-31 2 30 123 2014-09-01 2015-09-30 1 20
2: 123 2014-09-01 2015-09-30 1 20 123 2015-05-11 2016-10-31 2 30
这显示了两种组合中感兴趣的重叠。