数据框架1:房屋价格
year month MSA1 MSA2 MSA3
2000 1 12 6 7
2000 2 1 3 4
2001 3 9 5 7
数据框2:按揭信息
ID MSA YEAR MONTH
1 MSA1 2000 2
2 MSA3 2001 3
3 MSA2 2001 3
4 MSA1 2000 1
5 MSA3 2000 3
期待的结果:
ID MSA YEAR MONTH HOUSE_PRICE
1 MSA1 2000 2 1
2 MSA3 2001 3 7
3 MSA2 2001 3 5
任何人都知道如何以有效的方式实现这一目标?数据帧2很大,数据帧1大小合适。谢谢!
答案 0 :(得分:1)
假设两者都是data.tables dt1
和dt2
,这可以在不必将其转换为长格式的情况下完成,如下所示:
require(data.table)
dt2[dt1, .(ID, MSA, House_price = get(MSA)), by=.EACHI,
nomatch=0L, on=c(YEAR="year", MONTH="month")]
# YEAR MONTH ID MSA House_price
# 1: 2000 1 4 MSA1 12
# 2: 2000 2 1 MSA1 1
# 3: 2001 3 2 MSA3 7
# 4: 2001 3 3 MSA2 5
dt1 = fread('year month MSA1 MSA2 MSA3
2000 1 12 6 7
2000 2 1 3 4
2001 3 9 5 7
')
dt2 = fread('ID MSA YEAR MONTH
1 MSA1 2000 2
2 MSA3 2001 3
3 MSA2 2001 3
4 MSA1 2000 1
5 MSA3 2000 3
')
答案 1 :(得分:0)
这看起来像turning a data frame from wide to long form然后是merging two data frames。这是一个包含gather
和right_join
的dplyr解决方案。名称更改只是为了使连接更容易。
library(dplyr)
library(tidyr)
names(df1) <- toupper(names(df1))
gather(df1,MSA,HOUSE_PRICE,-YEAR,-MONTH) %>%
right_join(df2,by = c("YEAR","MONTH","MSA"))
输出
YEAR MONTH MSA HOUSE_PRICE ID
1 2000 2 MSA1 1 1
2 2001 3 MSA3 7 2
3 2001 3 MSA2 5 3
4 2000 1 MSA1 12 4
5 2000 3 MSA3 NA 5