R中两个数据帧的条件JOIN

时间:2017-10-15 13:59:14

标签: sql r merge inner-join

假设有两个数据框,如下所示(从this post给出):

df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))

df1
#  CustomerId Product
#           1 Toaster
#           2 Toaster
#           3 Toaster
#           4   Radio
#           5   Radio
#           6   Radio

df2
#  CustomerId   State
#           2 Alabama
#           4 Alabama
#           6    Ohio

问题是如何在R中执行以下sql查询:

SELECT * FROM df1 JOIN df2 on df1.CustomerId <= df2.CustomerId

我所知道的是我可以使用merge(df1, df2, by = "CustomerId")进行内连接。但它不满足加入的条件。

2 个答案:

答案 0 :(得分:0)

这是一个令人困惑的方式来做到这一点。但它有效:

library(tidyverse)
df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))

map2_df(
  df1$CustomerId, df1$Product,
  .f = ~ {
    temp <- df2 %>% filter(.x <= CustomerId)
    tibble(CustomerId.x = .x, Product = .y, 
           CustomerId.y = temp$CustomerId, State = temp$State)
  }
)

答案 1 :(得分:0)

正如我在亲爱的Grothendieck的评论中所发现的,一个简单的解决方案是使用sqldf包并以sql格式得到我的结果:

library(sqldf)
sqldf("SELECT * FROM df1 JOIN df2 on df1.CustomerId <= df2.CustomerId")