如何在R中运行SQL左连接?

时间:2018-04-24 15:58:21

标签: r

相对较新的R和我还没能在网上找到相当于SQL left join的东西。我们说我的数据看起来像是:

School| Year| Grade              | #students | math_approached| math_metorexceeded
610534| 2016| Mathematics Grade 3| 57        | 5.3%           | 94.7%
610534| 2016| Mathematics Grade 4| 60        | 8.3%           | 91.7%
610534| 2016| Mathematics Grade 5| 59        | 6.8%           | 93.2%
610534| 2015| Mathematics Grade 3| 57        | 5.3%           | 94.7%
610534| 2015| Mathematics Grade 4| 60        | 8.3%           | 91.7%
610534| 2015| Mathematics Grade 5| 59        | 6.8%           | 93.2%
699999| 2015| Mathematics Grade 3| 51        | 5.3%           | 94.7%
699999| 2015| Mathematics Grade 4| 61        | 8.3%           | 91.7%
699999| 2015| Mathematics Grade 5| 53        | 6.8%           | 93.2%

我试图找到上一年度学校成绩的Math%接近值。在SQL中,这看起来像

select a.*, b.math_approached, b.math_metorexceeded
from mydata as a
left join mydata as b
  on a.school = b.school
  and a.grade = b.grade
  and b.year = '2015'
  and a.year = '2016'

回到R,我有一个包含所有数据的数据帧df。它有

df$school
df$year
df$grade
df$students
df$math..approached
df$math..met.or.exceeded

作为其列

1 个答案:

答案 0 :(得分:2)

可用的一个选项(需要最少量的额外工作)是使用sqldf包,它允许您在R中的数据帧上运行实际的SQL查询。代码很简单:

library(sqldf)

query <- "select a.*, b.math_approached, b.math_metorexceeded
    from df as a
    left join df as b
        on a.school = b.school
        and a.grade = b.grade
        and b.year = '2015'
        and a.year = '2016'"

result <- sqldf(query)

我必须对原始SQL查询进行的唯一更改是将R表中的数据框名称替换为SQL表名mydata,其中包含相同的信息df