相对较新的R和我还没能在网上找到相当于SQL left join
的东西。我们说我的数据看起来像是:
School| Year| Grade | #students | math_approached| math_metorexceeded
610534| 2016| Mathematics Grade 3| 57 | 5.3% | 94.7%
610534| 2016| Mathematics Grade 4| 60 | 8.3% | 91.7%
610534| 2016| Mathematics Grade 5| 59 | 6.8% | 93.2%
610534| 2015| Mathematics Grade 3| 57 | 5.3% | 94.7%
610534| 2015| Mathematics Grade 4| 60 | 8.3% | 91.7%
610534| 2015| Mathematics Grade 5| 59 | 6.8% | 93.2%
699999| 2015| Mathematics Grade 3| 51 | 5.3% | 94.7%
699999| 2015| Mathematics Grade 4| 61 | 8.3% | 91.7%
699999| 2015| Mathematics Grade 5| 53 | 6.8% | 93.2%
我试图找到上一年度学校成绩的Math%接近值。在SQL中,这看起来像
select a.*, b.math_approached, b.math_metorexceeded
from mydata as a
left join mydata as b
on a.school = b.school
and a.grade = b.grade
and b.year = '2015'
and a.year = '2016'
回到R,我有一个包含所有数据的数据帧df
。它有
df$school
df$year
df$grade
df$students
df$math..approached
df$math..met.or.exceeded
作为其列
答案 0 :(得分:2)
可用的一个选项(需要最少量的额外工作)是使用sqldf
包,它允许您在R中的数据帧上运行实际的SQL查询。代码很简单:
library(sqldf)
query <- "select a.*, b.math_approached, b.math_metorexceeded
from df as a
left join df as b
on a.school = b.school
and a.grade = b.grade
and b.year = '2015'
and a.year = '2016'"
result <- sqldf(query)
我必须对原始SQL查询进行的唯一更改是将R表中的数据框名称替换为SQL表名mydata
,其中包含相同的信息df
。