我试图合并两个data.tables
。一个是年度和双边,例如:
library(data.table)
bilateral <- data.table(country=c("AT","AT","DE","DE"),
counterparty=c("DE","FR","AT","FR"),
time=c("2001Q1"),
bilateral_value=rnorm(4))
bilateral[,countrypair:=paste(country,counterparty,sep="_")]
另一个是聚合季度,例如:
quarterly <- data.table(country=c(rep("DE",4),rep("AT",4)),
time=c(rep(c("2001Q1","2001Q2","2001Q3","2001Q4"),2)),
aggregate_value=rnorm(8))
我希望按年份和时间(年份)合并当然有效:
Data <- merge(bilateral, quarterly,by=c("country","time"),all=T)
但是我想填写季度(NA's
代表国家/地区但aggregate_value
对应country
),所以我想复制季度数据集中每个双边的值对基于country
。我认为这应该可以直接在merge.data.table
中进行,但我无法弄清楚如何。
我的目标是:
goal <- data.table(country=c(rep("DE",8),rep("AT",8)),
counterparty=c("AT",NA,NA,NA,"FR",NA,NA,NA,"DE",NA,NA,NA,
"FR",NA,NA,NA),
time=c(rep(c("2001Q1","2001Q2","2001Q3","2001Q4"),4)),
bilateral_value=c(Data[1,bilateral_value],NA,NA,NA,
Data[2,bilateral_value],NA,NA,NA,
Data[6,bilateral_value],NA,NA,NA,
Data[7,bilateral_value],NA,NA,NA),
countrypair=c("AT_DE",NA,NA,NA,"AT_FR",NA,NA,NA,"DE_AT",NA,NA,NA,
"DE_FR",NA,NA,NA),
aggregate_value=c(rep(Data[2:5,aggregate_value],2),
rep(Data[7:10,aggregate_value],2)))
答案 0 :(得分:2)
确定。我认为这会返回与目标相同的输出。它包含您的代码,然后使用交叉连接(CJ
)以扩展到所需级别:
# set key for cross join
setkey(Data, country, counterparty, time)
temp <- Data[CJ(unique(country),
unique(counterparty), unique(time))][country != counterparty & !is.na(counterparty)]
正如@Frank指出的那样,使用CJ
中的唯一参数可以缩短(并且可能更有效):
temp <- Data[CJ(country, counterparty, time, unique=T)
][country != counterparty & !is.na(counterparty)]
最后,使用左连接到季度填充聚合值变量:
# remove partially filled agg_value column
temp[, aggregate_value := NULL]
# join on full aggregate value column
temp[quarterly, on=c("country", "time")]