我有一个这样的数据框
id start end
1 20/06/88 24/07/89
1 27/07/89 13/04/93
1 14/04/93 6/09/95
2 3/01/92 11/02/94
2 30/03/94 16/04/96
2 17/04/96 18/08/97
我想与其他数据框合并
id date
1 26/08/88
2 10/05/96
生成的合并数据框应如下图所示
id start end date
1 20/06/88 24/07/89 26/06/88
1 27/07/89 13/04/93 NA
1 14/04/93 6/09/95 NA
2 3/01/92 11/02/94 NA
2 30/03/94 16/04/96 NA
2 17/04/96 18/08/97 10/05/96
在实践中,我想基于id合并两个数据帧,并且日期必须位于第一个数据帧的起始和结束变量所跨越的区间内。
您对此有何建议?我尝试使用Fuzzyjoin程序包,但存在一些内存问题。
非常感谢大家
答案 0 :(得分:2)
可能是骗子,当我找到一个好的目标时将其删除。在此期间,我们可以使用fuzzyjoin
library(tidyverse)
library(fuzzyjoin)
df1 %>%
mutate_at(2:3, as.Date, "%d/%m/%y") %>%
fuzzy_left_join(
df2 %>% mutate(date = as.Date(date, "%d/%m/%y")),
by = c("id" = "id", "start" = "date", "end" = "date"),
match_fun = list(`==`, `<`, `>`))
# id.x start end id.y date
#1 1 1988-06-20 1989-07-24 1 1988-08-26
#2 1 1989-07-27 1993-04-13 NA <NA>
#3 1 1993-04-14 1995-09-06 NA <NA>
#4 2 1992-01-03 1994-02-11 NA <NA>
#5 2 1994-03-30 1996-04-16 NA <NA>
#6 2 1996-04-17 1997-08-18 2 1996-05-10
所有剩余的内容正在整理id
列。
df1 <- read.table(text = "
id start end
1 20/06/88 24/07/89
1 27/07/89 13/04/93
1 14/04/93 6/09/95
2 3/01/92 11/02/94
2 30/03/94 16/04/96
2 17/04/96 18/08/97", header = T)
df2 <- read.table(text = "
id date
1 26/08/88
2 10/05/96 ", header = T)
答案 1 :(得分:1)
您可以将sqldf
用于复杂的联接:
require(sqldf)
sqldf("SELECT df1.*,df2.date,df2.id as id2
FROM df1
LEFT JOIN df2
ON df1.id = df2.id AND
df1.start < df2.date AND
df1.end > df2.date")