如何按日期范围和ID合并两个表?

时间:2016-10-21 00:46:28

标签: r date merge

我有个人特征表,例如:

person <- data.frame(group.id = c("N","N","P"), person.id = c("A", "B", "C"), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2003-08-01"), as.Date(x = "2004-06-23")), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2004-09-01"), as.Date(x = "2006-07-01")), c = 1:3, d = 3:5)

和组特征的组表,例如:

group <- data.frame(group.id = c("N", "N", "N", "O", "O", "O", "P", "P", "P"), report.date = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-08-01"), as.Date(x = "2002-09-01")), a = c(1:3), b = c(4:6))

我想通过group.id和适用的日期范围合并它们,例如:

group2 <- data.frame(group, person.id = c("A", "A", "A", NA, NA, NA, NA, NA, NA), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), NA, NA, NA, NA, NA, NA), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), NA, NA, NA, NA, NA, NA), c = c(1, 1, 1, NA, NA, NA, NA, NA, NA), d = c(3, 3, 3, NA, NA, NA, NA, NA, NA))
  group.id report.date a b person.id       strt        end  c  d
1        N  2002-07-01 1 4         A 2002-07-01 2003-08-01  1  3
2        N  2002-08-01 2 5         A 2002-07-01 2003-08-01  1  3
3        N  2002-09-01 3 6         A 2002-07-01 2003-08-01  1  3
4        O  2002-07-01 1 4      <NA>       <NA>       <NA> NA NA
5        O  2002-08-01 2 5      <NA>       <NA>       <NA> NA NA
6        O  2002-09-01 3 6      <NA>       <NA>       <NA> NA NA
7        P  2002-07-01 1 4      <NA>       <NA>       <NA> NA NA
8        P  2002-08-01 2 5      <NA>       <NA>       <NA> NA NA
9        P  2002-09-01 3 6      <NA>       <NA>       <NA> NA NA

是否有人建议如何在R中执行此操作?

1 个答案:

答案 0 :(得分:1)

person <- data.frame(group_id = c("N","N","P"), person_id = c("A", "B", "C"), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2003-08-01"), as.Date(x = "2004-06-23")), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2004-09-01"), as.Date(x = "2006-07-01")), c = 1:3, d = 3:5)

group <- data.frame(group_id = c("N", "N", "N", "O", "O", "O", "P", "P", "P"), report_date = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-08-01"), as.Date(x = "2002-09-01")), a = c(1:3), b = c(4:6))

group2 <- data.frame(group, person_id = c("A", "A", "A", NA, NA, NA, NA, NA, NA), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), NA, NA, NA, NA, NA, NA), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), NA, NA, NA, NA, NA, NA), c = c(1, 1, 1, NA, NA, NA, NA, NA, NA), d = c(3, 3, 3, NA, NA, NA, NA, NA, NA))


library(sqldf)


sqldf("select a.*, b.* from 'group' a left join person b on a.group_id = b.group_id and (a.report_date >= b.strt and a.report_date <= b.end)")
  group_id report_date a b group_id person_id       strt        end  c  d
1        N  2002-07-01 1 4        N         A 2002-07-01 2003-08-01  1  3
2        N  2002-08-01 2 5        N         A 2002-07-01 2003-08-01  1  3
3        N  2002-09-01 3 6        N         A 2002-07-01 2003-08-01  1  3
4        O  2002-07-01 1 4     <NA>      <NA>       <NA>       <NA> NA NA
5        O  2002-08-01 2 5     <NA>      <NA>       <NA>       <NA> NA NA
6        O  2002-09-01 3 6     <NA>      <NA>       <NA>       <NA> NA NA
7        P  2002-07-01 1 4     <NA>      <NA>       <NA>       <NA> NA NA
8        P  2002-08-01 2 5     <NA>      <NA>       <NA>       <NA> NA NA
9        P  2002-09-01 3 6     <NA>      <NA>       <NA>       <NA> NA NA

请注意,group是一个保留字,因此我必须将其放在单引号中以将其用作表格。我还将列名中的.更改为_以避免出现问题,但您可以离开.并引用所有列名称。