R合并季度data.table汇总到双边年度数据表,同时扩展每个i

时间:2016-06-01 12:32:06

标签: r data.table

我试图合并两个data.tables。一个是年度和双边,例如:

library(data.table)
bilateral <- data.table(country=c("AT","AT","DE","DE"),
             counterparty=c("DE","FR","AT","FR"),
             time=c("2001Q1"),
             bilateral_value=rnorm(4))
bilateral[,countrypair:=paste(country,counterparty,sep="_")]

另一个是聚合季度,例如:

quarterly <- data.table(country=c(rep("DE",4),rep("AT",4)),
                    time=c(rep(c("2001Q1","2001Q2","2001Q3","2001Q4"),2)),
                    aggregate_value=rnorm(8))

我希望按年份和时间(年份)合并当然有效:

Data <- merge(bilateral, quarterly,by=c("country","time"),all=T)

但是我想填写季度(NA's代表国家/地区但aggregate_value对应country),所以我想复制季度数据集中每个双边的值对基于country。我认为这应该可以直接在merge.data.table中进行,但我无法弄清楚如何。

我的目标是:

goal <- data.table(country=c(rep("DE",8),rep("AT",8)),
                   counterparty=c("AT",NA,NA,NA,"FR",NA,NA,NA,"DE",NA,NA,NA,
                                  "FR",NA,NA,NA),
                   time=c(rep(c("2001Q1","2001Q2","2001Q3","2001Q4"),4)),
                   bilateral_value=c(Data[1,bilateral_value],NA,NA,NA,
                                     Data[2,bilateral_value],NA,NA,NA,
                                     Data[6,bilateral_value],NA,NA,NA,
                                     Data[7,bilateral_value],NA,NA,NA),
                   countrypair=c("AT_DE",NA,NA,NA,"AT_FR",NA,NA,NA,"DE_AT",NA,NA,NA,
                                 "DE_FR",NA,NA,NA),
                   aggregate_value=c(rep(Data[2:5,aggregate_value],2),
                                     rep(Data[7:10,aggregate_value],2)))

1 个答案:

答案 0 :(得分:2)

确定。我认为这会返回与目标相同的输出。它包含您的代码,然后使用交叉连接(CJ)以扩展到所需级别:

# set key for cross join
setkey(Data, country, counterparty, time)
temp <- Data[CJ(unique(country), 
         unique(counterparty), unique(time))][country != counterparty & !is.na(counterparty)]

正如@Frank指出的那样,使用CJ中的唯一参数可以缩短(并且可能更有效):

    temp <- Data[CJ(country, counterparty, time, unique=T)
                 ][country != counterparty & !is.na(counterparty)]

最后,使用左连接到季度填充聚合值变量:

# remove partially filled agg_value column
temp[, aggregate_value := NULL]
# join on full aggregate value column
temp[quarterly, on=c("country", "time")]