每个组的data.table滚动连接

时间:2017-04-19 13:05:47

标签: r join data.table

如何为每个组加入两个带滚动连接的数据表?

library(data.table)
alldates = as.Date(c('2000-01-01','2005-01-01','2010-01-01','2015-01-01','2020-01-01'))
gdp = data.table(date = alldates[c(1,3,5,1,3,5)], country = c('A','A','A','B','B','B'), value = c(1,10,100, 2, 20, 200))
gdp
   date country value
1: 2000-01-01       A     1
2: 2010-01-01       A    10
3: 2020-01-01       A   100
4: 2000-01-01       B     2
5: 2010-01-01       B    20
6: 2020-01-01       B   200

price = data.table(date = alldates, price = c(101, 102, 103, 104, 105))
price
   date price
1: 2000-01-01   101
2: 2005-01-01   102 # gdp table is missing mid decade data
3: 2010-01-01   103
4: 2015-01-01   104
5: 2020-01-01   105

我想要的结果

         date country value price
1: 2000-01-01       A     1   101
2: 2000-01-01       B     2   101
3: 2005-01-01       A     1   102 # fill in value using previous gdp for each country
4: 2005-01-01       B     2   102
5: 2010-01-01       A    10   103
6: 2010-01-01       B    20   103
7: 2015-01-01       A    10   104
8: 2015-01-01       B    20   104
9: 2020-01-01       A   100   105
10: 2020-01-01      B   200   105

NB

  1. 行顺序无关紧要
  2. 不需要是单行
  3. gdp[price, on = 'date', roll = TRUE]不起作用

1 个答案:

答案 0 :(得分:4)

清理数据后......

# fill in missing levels
gdpf = gdp[CJ(date = price$date, country = country, unique = TRUE), on=.(date, country)]

# fill in values for missing levels
gdpf[order(country), value := first(value), by=.(country, cumsum(!is.na(value)))]

然后,更新加入可以获取价格:

gdpf[price, on=.(date), price := i.price ]

          date country value price
 1: 2000-01-01       A     1   101
 2: 2000-01-01       B     2   101
 3: 2005-01-01       A     1   102
 4: 2005-01-01       B     2   102
 5: 2010-01-01       A    10   103
 6: 2010-01-01       B    20   103
 7: 2015-01-01       A    10   104
 8: 2015-01-01       B    20   104
 9: 2020-01-01       A   100   105
10: 2020-01-01       B   200   105

另一种使用zoo包填充缺失级别值的方法是value := na.locf(value), by=country