如何有条件地合并R中的两个数据帧(公共列,条件)

时间:2015-11-09 21:40:34

标签: r merge match

我一直试图这样做2-3天,但仍然找不到答案。我想要做的是我有两个数据帧x,y(在下面给出它们的样本)

X
     Response.No Tab.No Survey.Date AC.Name Mandal.Name Village.Name
1         9530      1  2015-05-26      NA          NA           NA
2         6702      1  2015-05-30      NA          NA           NA
3        26744      1  2015-05-31      NA          NA           NA
4         8925      1  2015-06-03      NA          NA           NA
5        20242      1  2015-06-04      NA          NA           NA
6        21316      1  2015-06-04      NA          NA           NA
7        28056      1  2015-06-04      NA          NA           NA
8        12661      1  2015-06-05      NA          NA           NA
9        17187      1  2015-06-05      NA          NA           NA
10       28795      1  2015-06-05      NA          NA           NA

Y
     AC.Name   Mandal.Name      Village.Name Tab.No Survey.Start.Date Survey.End.Date
1  Nandigama Chanderlapadu        Punnavalli      1        2015-05-23      2015-05-27
2  Nandigama Chanderlapadu        Kasarabada      1        2015-05-30      2015-06-07
3  Nandigama Chanderlapadu     Kodavatikallu      1        2015-06-09      2015-06-28
4  Nandigama Chanderlapadu        Thurlapadu      1        2015-06-29      2015-07-13
5  Nandigama Chanderlapadu     Chanderlapadu      1        2015-07-14      2015-07-25
6  Nandigama Chanderlapadu            Popuru      2        2015-05-23      2015-05-27
7  Nandigama Chanderlapadu        Kandrapadu      2        2015-05-30      2015-06-08
8  Nandigama Chanderlapadu Vibhareethalapadu      3        2015-05-30      2015-06-04
9  Nandigama Chanderlapadu             Eturu      3        2015-06-10      2015-06-23
10 Nandigama Chanderlapadu      Bobbillapadu      3        2015-06-26      2015-07-03

即我希望按Tab.No列匹配x和y,但也要确保x $ Survey.Date介于y $ Survey.Start.Date和y $ Survey.End.Date之间。如果两个条件都不满足,则该行必须具有N.A值。我尝试过搜索google stackoverflow和R-Studio帮助但无法获得所需的结果。

Z
     Response.No Tab.No Survey.Date AC.Name      Mandal.Name   Village.Name
1         9530      1  2015-05-26      Nandigama Chanderlapadu Punnavalli
2         6702      1  2015-05-30      Nandigama Chanderlapadu Kasarabada
3        26744      1  2015-05-31      Nandigama Chanderlapadu Kasarabada
4         8925      1  2015-06-03      Nandigama Chanderlapadu Kasarabada
5        20242      1  2015-06-04      Nandigama Chanderlapadu Kasarabada
6        21316      1  2015-06-04      Nandigama Chanderlapadu Kasarabada
7        28056      1  2015-06-04      Nandigama Chanderlapadu Kasarabada
8        12661      1  2015-06-05      Nandigama Chanderlapadu Kasarabada
9        17187      1  2015-06-05      Nandigama Chanderlapadu Kasarabada
10       28795      1  2015-06-05      Nandigama Chanderlapadu Kasarabada

我已经检查过了: 1. How to merge two dataframes in R based on two conditions, matching column and within a range? 2. roll join with start/end window 3. Conditional merge/replacement in R

我一直试图使用merge(),cbind()和match()来解决这个问题,但无济于事。 我只能使用序列进行合并但没有日期条件。

感谢您的帮助

2 个答案:

答案 0 :(得分:4)

数据:

x <- data.table(Tab.No = c(1,1,2), Survey.Date = as.Date(c("2015-5-26","2015-6-15","2015-4-03")))
y <- data.table(AC.Name = c("abc","xyz","qwe"),
                Mandal.Name = c("def","pqr","rty"),
                Village.Name = c("def","pqr","rty"),
                Tab.No = c(1,1,2), 
                Survey.Start.Date = as.Date(c("2015-5-30","2015-5-01","2015-5-05")), 
                Survey.End.Date = as.Date(c("2015-6-30","2015-5-29","2015-6-30")))

我首先在Y上合并X,测试条件,然后将连接重新放回x数组:

使用数据表:

merge(x,merge(y,x,by = "Tab.No")[Survey.Date >= Survey.Start.Date & Survey.Date <= Survey.End.Date, list(Tab.No,AC.Name,Mandal.Name,Village.Name,Survey.Date)], by = c("Tab.No","Survey.Date"), all.x = T)
   serial       date add1 add2
1:      1 2015-05-26  xyz  pqr
2:      1 2015-06-15  abc  def
3:      2 2015-04-03   NA   NA

如果您不熟悉data.table:

,请稍微清楚一点
z <- merge(y,x,by = "Tab.No")[Survey.Date >= Survey.Start.Date & Survey.Date <= Survey.End.Date, list(Tab.No,AC.Name,Mandal.Name,Village.Name,Survey.Date)]
merge(x,z, by = c("Tab.No","Survey.Date"), all.x = T)

请注意,我忽略了x帧中的NA列,它们在开头是不必要的

答案 1 :(得分:3)

以下是dplyr的使用方法。

inner_join(X[,1:3],Y, by=c("Tab.No"))%>%
mutate(AC.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, AC.Name ,NA),
Mandal.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, Mandal.Name ,NA),
Village.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, Village.Name ,NA))%>%
group_by(Tab.No)%>%
filter(!is.na(AC.Name)|n()==1)%>%
select(Response.No,Tab.No,Survey.Date,AC.Name,Mandal.Name,Village.Name)

<强>结果

   Response.No Tab.No Survey.Date   AC.Name   Mandal.Name Village.Name
         (int)  (int)      (date)     (chr)         (chr)        (chr)
1         9530      1  2015-05-26 Nandigama Chanderlapadu   Punnavalli
2         6702      1  2015-05-30 Nandigama Chanderlapadu   Kasarabada
3        26744      1  2015-05-31 Nandigama Chanderlapadu   Kasarabada
4         8925      1  2015-06-03 Nandigama Chanderlapadu   Kasarabada
5        20242      1  2015-06-04 Nandigama Chanderlapadu   Kasarabada
6        21316      1  2015-06-04 Nandigama Chanderlapadu   Kasarabada
7        28056      1  2015-06-04 Nandigama Chanderlapadu   Kasarabada
8        12661      1  2015-06-05 Nandigama Chanderlapadu   Kasarabada
9        17187      1  2015-06-05 Nandigama Chanderlapadu   Kasarabada
10       28795      1  2015-06-05 Nandigama Chanderlapadu   Kasarabada

数据

X<-read.table(text="     Response.No Tab.No Survey.Date AC.Name Mandal.Name Village.Name
9530      1  2015-05-26      NA          NA           NA
6702      1  2015-05-30      NA          NA           NA
26744      1  2015-05-31      NA          NA           NA
8925      1  2015-06-03      NA          NA           NA
20242      1  2015-06-04      NA          NA           NA
21316      1  2015-06-04      NA          NA           NA
28056      1  2015-06-04      NA          NA           NA
12661      1  2015-06-05      NA          NA           NA
17187      1  2015-06-05      NA          NA           NA
28795      1  2015-06-05      NA          NA           NA
", header=T,stringsAsFactors =F)

Y<-read.table(text="AC.Name   Mandal.Name      Village.Name Tab.No Survey.Start.Date Survey.End.Date
Nandigama Chanderlapadu        Punnavalli      1        2015-05-23      2015-05-27
Nandigama Chanderlapadu        Kasarabada      1        2015-05-30      2015-06-07
Nandigama Chanderlapadu     Kodavatikallu      1        2015-06-09      2015-06-28
Nandigama Chanderlapadu        Thurlapadu      1        2015-06-29      2015-07-13
Nandigama Chanderlapadu     Chanderlapadu      1        2015-07-14      2015-07-25
Nandigama Chanderlapadu            Popuru      2        2015-05-23      2015-05-27
Nandigama Chanderlapadu        Kandrapadu      2        2015-05-30      2015-06-08
Nandigama Chanderlapadu Vibhareethalapadu      3        2015-05-30      2015-06-04
Nandigama Chanderlapadu             Eturu      3        2015-06-10      2015-06-23
Nandigama Chanderlapadu      Bobbillapadu      3        2015-06-26      2015-07-03
", header=T,stringsAsFactors =F)

X$Survey.Date <-as.Date(X$Survey.Date)
Y$Survey.Start.Date <-as.Date(Y$Survey.Start.Date)
Y$Survey.End.Date <-as.Date(Y$Survey.End.Date)