如何为每个组加入两个带滚动连接的数据表?
library(data.table)
alldates = as.Date(c('2000-01-01','2005-01-01','2010-01-01','2015-01-01','2020-01-01'))
gdp = data.table(date = alldates[c(1,3,5,1,3,5)], country = c('A','A','A','B','B','B'), value = c(1,10,100, 2, 20, 200))
gdp
date country value
1: 2000-01-01 A 1
2: 2010-01-01 A 10
3: 2020-01-01 A 100
4: 2000-01-01 B 2
5: 2010-01-01 B 20
6: 2020-01-01 B 200
price = data.table(date = alldates, price = c(101, 102, 103, 104, 105))
price
date price
1: 2000-01-01 101
2: 2005-01-01 102 # gdp table is missing mid decade data
3: 2010-01-01 103
4: 2015-01-01 104
5: 2020-01-01 105
我想要的结果
date country value price
1: 2000-01-01 A 1 101
2: 2000-01-01 B 2 101
3: 2005-01-01 A 1 102 # fill in value using previous gdp for each country
4: 2005-01-01 B 2 102
5: 2010-01-01 A 10 103
6: 2010-01-01 B 20 103
7: 2015-01-01 A 10 104
8: 2015-01-01 B 20 104
9: 2020-01-01 A 100 105
10: 2020-01-01 B 200 105
NB
gdp[price, on = 'date', roll = TRUE]
不起作用答案 0 :(得分:4)
清理数据后......
# fill in missing levels
gdpf = gdp[CJ(date = price$date, country = country, unique = TRUE), on=.(date, country)]
# fill in values for missing levels
gdpf[order(country), value := first(value), by=.(country, cumsum(!is.na(value)))]
然后,更新加入可以获取价格:
gdpf[price, on=.(date), price := i.price ]
date country value price
1: 2000-01-01 A 1 101
2: 2000-01-01 B 2 101
3: 2005-01-01 A 1 102
4: 2005-01-01 B 2 102
5: 2010-01-01 A 10 103
6: 2010-01-01 B 20 103
7: 2015-01-01 A 10 104
8: 2015-01-01 B 20 104
9: 2020-01-01 A 100 105
10: 2020-01-01 B 200 105
另一种使用zoo包填充缺失级别值的方法是value := na.locf(value), by=country
。