我一直试图这样做2-3天,但仍然找不到答案。我想要做的是我有两个数据帧x,y(在下面给出它们的样本)
X
Response.No Tab.No Survey.Date AC.Name Mandal.Name Village.Name
1 9530 1 2015-05-26 NA NA NA
2 6702 1 2015-05-30 NA NA NA
3 26744 1 2015-05-31 NA NA NA
4 8925 1 2015-06-03 NA NA NA
5 20242 1 2015-06-04 NA NA NA
6 21316 1 2015-06-04 NA NA NA
7 28056 1 2015-06-04 NA NA NA
8 12661 1 2015-06-05 NA NA NA
9 17187 1 2015-06-05 NA NA NA
10 28795 1 2015-06-05 NA NA NA
Y
AC.Name Mandal.Name Village.Name Tab.No Survey.Start.Date Survey.End.Date
1 Nandigama Chanderlapadu Punnavalli 1 2015-05-23 2015-05-27
2 Nandigama Chanderlapadu Kasarabada 1 2015-05-30 2015-06-07
3 Nandigama Chanderlapadu Kodavatikallu 1 2015-06-09 2015-06-28
4 Nandigama Chanderlapadu Thurlapadu 1 2015-06-29 2015-07-13
5 Nandigama Chanderlapadu Chanderlapadu 1 2015-07-14 2015-07-25
6 Nandigama Chanderlapadu Popuru 2 2015-05-23 2015-05-27
7 Nandigama Chanderlapadu Kandrapadu 2 2015-05-30 2015-06-08
8 Nandigama Chanderlapadu Vibhareethalapadu 3 2015-05-30 2015-06-04
9 Nandigama Chanderlapadu Eturu 3 2015-06-10 2015-06-23
10 Nandigama Chanderlapadu Bobbillapadu 3 2015-06-26 2015-07-03
即我希望按Tab.No列匹配x和y,但也要确保x $ Survey.Date介于y $ Survey.Start.Date和y $ Survey.End.Date之间。如果两个条件都不满足,则该行必须具有N.A值。我尝试过搜索google stackoverflow和R-Studio帮助但无法获得所需的结果。
Z
Response.No Tab.No Survey.Date AC.Name Mandal.Name Village.Name
1 9530 1 2015-05-26 Nandigama Chanderlapadu Punnavalli
2 6702 1 2015-05-30 Nandigama Chanderlapadu Kasarabada
3 26744 1 2015-05-31 Nandigama Chanderlapadu Kasarabada
4 8925 1 2015-06-03 Nandigama Chanderlapadu Kasarabada
5 20242 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
6 21316 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
7 28056 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
8 12661 1 2015-06-05 Nandigama Chanderlapadu Kasarabada
9 17187 1 2015-06-05 Nandigama Chanderlapadu Kasarabada
10 28795 1 2015-06-05 Nandigama Chanderlapadu Kasarabada
我已经检查过了: 1. How to merge two dataframes in R based on two conditions, matching column and within a range? 2. roll join with start/end window 3. Conditional merge/replacement in R
我一直试图使用merge(),cbind()和match()来解决这个问题,但无济于事。 我只能使用序列进行合并但没有日期条件。
感谢您的帮助
答案 0 :(得分:4)
数据:
x <- data.table(Tab.No = c(1,1,2), Survey.Date = as.Date(c("2015-5-26","2015-6-15","2015-4-03")))
y <- data.table(AC.Name = c("abc","xyz","qwe"),
Mandal.Name = c("def","pqr","rty"),
Village.Name = c("def","pqr","rty"),
Tab.No = c(1,1,2),
Survey.Start.Date = as.Date(c("2015-5-30","2015-5-01","2015-5-05")),
Survey.End.Date = as.Date(c("2015-6-30","2015-5-29","2015-6-30")))
我首先在Y上合并X,测试条件,然后将连接重新放回x数组:
使用数据表:
merge(x,merge(y,x,by = "Tab.No")[Survey.Date >= Survey.Start.Date & Survey.Date <= Survey.End.Date, list(Tab.No,AC.Name,Mandal.Name,Village.Name,Survey.Date)], by = c("Tab.No","Survey.Date"), all.x = T)
serial date add1 add2
1: 1 2015-05-26 xyz pqr
2: 1 2015-06-15 abc def
3: 2 2015-04-03 NA NA
如果您不熟悉data.table:
,请稍微清楚一点z <- merge(y,x,by = "Tab.No")[Survey.Date >= Survey.Start.Date & Survey.Date <= Survey.End.Date, list(Tab.No,AC.Name,Mandal.Name,Village.Name,Survey.Date)]
merge(x,z, by = c("Tab.No","Survey.Date"), all.x = T)
请注意,我忽略了x帧中的NA
列,它们在开头是不必要的
答案 1 :(得分:3)
以下是dplyr
的使用方法。
inner_join(X[,1:3],Y, by=c("Tab.No"))%>%
mutate(AC.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, AC.Name ,NA),
Mandal.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, Mandal.Name ,NA),
Village.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, Village.Name ,NA))%>%
group_by(Tab.No)%>%
filter(!is.na(AC.Name)|n()==1)%>%
select(Response.No,Tab.No,Survey.Date,AC.Name,Mandal.Name,Village.Name)
<强>结果强>
Response.No Tab.No Survey.Date AC.Name Mandal.Name Village.Name
(int) (int) (date) (chr) (chr) (chr)
1 9530 1 2015-05-26 Nandigama Chanderlapadu Punnavalli
2 6702 1 2015-05-30 Nandigama Chanderlapadu Kasarabada
3 26744 1 2015-05-31 Nandigama Chanderlapadu Kasarabada
4 8925 1 2015-06-03 Nandigama Chanderlapadu Kasarabada
5 20242 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
6 21316 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
7 28056 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
8 12661 1 2015-06-05 Nandigama Chanderlapadu Kasarabada
9 17187 1 2015-06-05 Nandigama Chanderlapadu Kasarabada
10 28795 1 2015-06-05 Nandigama Chanderlapadu Kasarabada
数据强>
X<-read.table(text=" Response.No Tab.No Survey.Date AC.Name Mandal.Name Village.Name
9530 1 2015-05-26 NA NA NA
6702 1 2015-05-30 NA NA NA
26744 1 2015-05-31 NA NA NA
8925 1 2015-06-03 NA NA NA
20242 1 2015-06-04 NA NA NA
21316 1 2015-06-04 NA NA NA
28056 1 2015-06-04 NA NA NA
12661 1 2015-06-05 NA NA NA
17187 1 2015-06-05 NA NA NA
28795 1 2015-06-05 NA NA NA
", header=T,stringsAsFactors =F)
Y<-read.table(text="AC.Name Mandal.Name Village.Name Tab.No Survey.Start.Date Survey.End.Date
Nandigama Chanderlapadu Punnavalli 1 2015-05-23 2015-05-27
Nandigama Chanderlapadu Kasarabada 1 2015-05-30 2015-06-07
Nandigama Chanderlapadu Kodavatikallu 1 2015-06-09 2015-06-28
Nandigama Chanderlapadu Thurlapadu 1 2015-06-29 2015-07-13
Nandigama Chanderlapadu Chanderlapadu 1 2015-07-14 2015-07-25
Nandigama Chanderlapadu Popuru 2 2015-05-23 2015-05-27
Nandigama Chanderlapadu Kandrapadu 2 2015-05-30 2015-06-08
Nandigama Chanderlapadu Vibhareethalapadu 3 2015-05-30 2015-06-04
Nandigama Chanderlapadu Eturu 3 2015-06-10 2015-06-23
Nandigama Chanderlapadu Bobbillapadu 3 2015-06-26 2015-07-03
", header=T,stringsAsFactors =F)
X$Survey.Date <-as.Date(X$Survey.Date)
Y$Survey.Start.Date <-as.Date(Y$Survey.Start.Date)
Y$Survey.End.Date <-as.Date(Y$Survey.End.Date)