我有个人特征表,例如:
person <- data.frame(group.id = c("N","N","P"), person.id = c("A", "B", "C"), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2003-08-01"), as.Date(x = "2004-06-23")), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2004-09-01"), as.Date(x = "2006-07-01")), c = 1:3, d = 3:5)
和组特征的组表,例如:
group <- data.frame(group.id = c("N", "N", "N", "O", "O", "O", "P", "P", "P"), report.date = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-08-01"), as.Date(x = "2002-09-01")), a = c(1:3), b = c(4:6))
我想通过group.id和适用的日期范围合并它们,例如:
group2 <- data.frame(group, person.id = c("A", "A", "A", NA, NA, NA, NA, NA, NA), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), NA, NA, NA, NA, NA, NA), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), NA, NA, NA, NA, NA, NA), c = c(1, 1, 1, NA, NA, NA, NA, NA, NA), d = c(3, 3, 3, NA, NA, NA, NA, NA, NA))
group.id report.date a b person.id strt end c d 1 N 2002-07-01 1 4 A 2002-07-01 2003-08-01 1 3 2 N 2002-08-01 2 5 A 2002-07-01 2003-08-01 1 3 3 N 2002-09-01 3 6 A 2002-07-01 2003-08-01 1 3 4 O 2002-07-01 1 4 <NA> <NA> <NA> NA NA 5 O 2002-08-01 2 5 <NA> <NA> <NA> NA NA 6 O 2002-09-01 3 6 <NA> <NA> <NA> NA NA 7 P 2002-07-01 1 4 <NA> <NA> <NA> NA NA 8 P 2002-08-01 2 5 <NA> <NA> <NA> NA NA 9 P 2002-09-01 3 6 <NA> <NA> <NA> NA NA
是否有人建议如何在R中执行此操作?
答案 0 :(得分:1)
person <- data.frame(group_id = c("N","N","P"), person_id = c("A", "B", "C"), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2003-08-01"), as.Date(x = "2004-06-23")), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2004-09-01"), as.Date(x = "2006-07-01")), c = 1:3, d = 3:5)
group <- data.frame(group_id = c("N", "N", "N", "O", "O", "O", "P", "P", "P"), report_date = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-08-01"), as.Date(x = "2002-09-01")), a = c(1:3), b = c(4:6))
group2 <- data.frame(group, person_id = c("A", "A", "A", NA, NA, NA, NA, NA, NA), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), NA, NA, NA, NA, NA, NA), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), NA, NA, NA, NA, NA, NA), c = c(1, 1, 1, NA, NA, NA, NA, NA, NA), d = c(3, 3, 3, NA, NA, NA, NA, NA, NA))
library(sqldf)
sqldf("select a.*, b.* from 'group' a left join person b on a.group_id = b.group_id and (a.report_date >= b.strt and a.report_date <= b.end)")
group_id report_date a b group_id person_id strt end c d 1 N 2002-07-01 1 4 N A 2002-07-01 2003-08-01 1 3 2 N 2002-08-01 2 5 N A 2002-07-01 2003-08-01 1 3 3 N 2002-09-01 3 6 N A 2002-07-01 2003-08-01 1 3 4 O 2002-07-01 1 4 <NA> <NA> <NA> <NA> NA NA 5 O 2002-08-01 2 5 <NA> <NA> <NA> <NA> NA NA 6 O 2002-09-01 3 6 <NA> <NA> <NA> <NA> NA NA 7 P 2002-07-01 1 4 <NA> <NA> <NA> <NA> NA NA 8 P 2002-08-01 2 5 <NA> <NA> <NA> <NA> NA NA 9 P 2002-09-01 3 6 <NA> <NA> <NA> <NA> NA NA
请注意,group
是一个保留字,因此我必须将其放在单引号中以将其用作表格。我还将列名中的.
更改为_
以避免出现问题,但您可以离开.
并引用所有列名称。