如何在 R 中执行左连接?

时间:2021-07-13 21:38:38

标签: r dplyr left-join

以下是示例数据和一种操作。第一个数据集是特定于行业的就业。第二组数据是整体就业和失业率。我正在寻求进行左连接(或者至少我认为应该是这样)以实现以下所需的结果。当我这样做时,随着行数的增长,我遇到了一对多的问题。在这个例子中,它从 14 到 18。在更大的数据集中,它从 228 到 4348。主要问题是这是否可以只用一个正确编写的连接脚本来完成,或者还有更多吗?

 area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
 periodyear<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
 month<-c(1,2,3,4,5,6,7,8,9,10,11,12,1,2)
 emp1 <-c(10,11,12,13,14,15,16,17,20,21,22,24,26,28)

 firstset<-data.frame(area1,periodyear,month,emp1)



 area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
 periodyear1<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
 period<-c(01,02,03,04,05,06,07,08,09,10,11,12,01,02)
 rate<-c(3.0,3.2,3.4,3.8,2.5,4.5,6.5,9.1,10.6,5.5,7.8,6.5,4.5,2.9)
 emp2<-c(1001,1002,1005,1105,1254,1025,1078,1106,1099,1188,1254,1250,1301,1188)

 secondset<-data.frame(area2,periodyear1,period,rate,emp2)

 secondset <- secondset%>%mutate(month = as.numeric(period))

 secondset <- left_join(firstset,secondset, by=c("month"))

所需结果(14 行,下面是前 3 行)

 area1     periodyear   month     emp1    rate    emp2
000000         2020        1        10      3.0    1001
000000         2020        2        11      3.2    1002
000000         2020        3        12      3.4    1005

1 个答案:

答案 0 :(得分:1)

我们可能还需要在 by

中添加“periodyear”
library(dplyr)
left_join(firstset,secondset, by=c("periodyear" = "periodyear1", 
      "area1" = "area2", "month"))

-输出

   area1 periodyear month emp1 period rate emp2
1      0       2020     1   10      1  3.0 1001
2      0       2020     2   11      2  3.2 1002
3      0       2020     3   12      3  3.4 1005
...