以二维方式合并R中的数据帧

时间:2016-01-23 19:14:37

标签: r multidimensional-array merge dataframe

数据框架1:房屋价格

year    month   MSA1    MSA2    MSA3
2000    1       12  6   7
2000    2       1   3   4
2001    3       9   5   7

数据框2:按揭信息

ID  MSA YEAR    MONTH   
1   MSA1    2000    2   
2   MSA3    2001    3   
3   MSA2    2001    3   
4   MSA1    2000    1   
5   MSA3    2000    3   

期待的结果:

ID  MSA YEAR    MONTH   HOUSE_PRICE
1   MSA1    2000    2   1
2   MSA3    2001    3   7
3   MSA2    2001    3   5

任何人都知道如何以有效的方式实现这一目标?数据帧2很大,数据帧1大小合适。谢谢!

2 个答案:

答案 0 :(得分:1)

假设两者都是data.tables dt1dt2,这可以在不必将其转换为长格式的情况下完成,如下所示:

require(data.table)
dt2[dt1, .(ID, MSA, House_price = get(MSA)), by=.EACHI, 
           nomatch=0L, on=c(YEAR="year", MONTH="month")]
#    YEAR MONTH ID  MSA House_price
# 1: 2000     1  4 MSA1          12
# 2: 2000     2  1 MSA1           1
# 3: 2001     3  2 MSA3           7
# 4: 2001     3  3 MSA2           5
dt1 = fread('year    month   MSA1    MSA2    MSA3
2000    1       12  6   7
            2000    2       1   3   4
            2001    3       9   5   7
            ')

dt2 = fread('ID  MSA YEAR    MONTH   
1   MSA1    2000    2   
            2   MSA3    2001    3   
            3   MSA2    2001    3   
            4   MSA1    2000    1   
            5   MSA3    2000    3   
            ')

答案 1 :(得分:0)

这看起来像turning a data frame from wide to long form然后是merging two data frames。这是一个包含gatherright_join的dplyr解决方案。名称更改只是为了使连接更容易。

library(dplyr)
library(tidyr)
names(df1) <- toupper(names(df1))
gather(df1,MSA,HOUSE_PRICE,-YEAR,-MONTH) %>% 
  right_join(df2,by = c("YEAR","MONTH","MSA"))

输出

  YEAR MONTH  MSA HOUSE_PRICE ID
1 2000     2 MSA1           1  1
2 2001     3 MSA3           7  2
3 2001     3 MSA2           5  3
4 2000     1 MSA1          12  4
5 2000     3 MSA3          NA  5