我试图在data.table中运行滚动回归。有许多问题可以解决我想要做的事情,但它们通常都是3年以上,并提供不优雅的答案。 (参见:here,例如)
我想知道data.table包是否有更新更直观/更快?
这是我想要做的。我的代码如下所示:
DT<-data.table(
Date = seq(as.Date("2000/1/1"), by = "day", length.out = 1000),
x1=rnorm(1000),
x2=rnorm(1000),
x3=rnorm(1000),
y=rnorm(1000),
country=rep(c("a","b","c","d"), each=25))
我想在x1,x2和x3上,在180天的滚动窗口中按国家/地区对y进行回归,并按日期存储系数。
理想情况下,语法看起来像这样:
DT[,.(coef.x1 := coef(y~x1+x2+x3)[2] ,
coef.x2 := coef(y~x1+x2+x3)[3],
coef(y~x1+x2+x3)[4],
by=c("country",ROLLING WINDOW)]
...但更优雅/尽可能避免重复! :)
由于某种原因,我还没有让rollapply语法对我有用。
谢谢!
编辑:
谢谢@michaelchirico。
你的建议接近我的目标 - 也许它可以修改代码来接收它但是又一次,我被卡住了。
这是对我需要的更仔细的阐述。一些代码:
DT<-data.table(
Date = rep(seq(as.Date("2000/1/1"), by = "day", length.out = 10),times=3), #same dates per country
x1=rep(rnorm(10),time=3), #x1's repeat - same per country
x2=rep(rnorm(10), times=3),#x2's repeat - same per country
x3=rep(rnorm(10), times=3), #x3's repeat - same per country
y=rnorm(30), #y's do not repeat and are unique per country per day
country=rep(c("a","b","c"), each=10))
#to calculate the coefficients by individual country:
a<-subset(DT,country=="a")
b<-subset(DT,country=="b")
window<-5 #declare window
coefs.a<-coef(lm(y~x1+x2+x3, data=a[1:window]))#initialize my coef variable
coefs.b<-coef(lm(y~x1+x2+x3, data=b[1:window]))#initialize my coef variable
##calculate coefficients per window
for(i in 1:(length(a$Date)-window)){
coefs.a<-rbind(coefs.a, coef(lm(y~x1+x2+x3, data=a[(i+1):(i+window-1)])))
coefs.b<-rbind(coefs.b, coef(lm(y~x1+x2+x3, data=b[(i+1):(i+window-1)])))
}
此数据集与前一个数据集的不同之处在于日期和x1,x2,x3都重复。我的每个国家都是独一无二的。
在我的实际数据集中,我有120个国家/地区。我可以为每个国家计算这个,但它非常慢,然后我必须将所有系数重新加入到单个数据集中以分析结果。
是否有类似于您提议的最终单个data.table,所有观察结果?
再次感谢!!
答案 0 :(得分:0)
It's still not clear exactly what you're after, but here's a shot which should be close (only minor adjustments need be made depending on the details):
I can't really speak to speed.
TT <- DT[ , uniqueN(Date), by = country][ , max(V1)]
window <- 5
#pre-declare a matrix of windows; each column represents
#one of the possible windows of days
windows <- matrix(1:TT, nrow = TT + 1, ncol = max(TT - window + 1, 1))[1:window, ]
DT[ , {
#not all possible windows necessarily apply to each
# country; subset to find only the relevant windows
windowsj <- windows[ , 1:(uniqueN(Date) - window + 1)]
#lapply returns a list (which can be readily assigned with :=)
lapply(1:ncol(windowsj),
function(ii){
#subset to relevant rows
.SD[windowsj[ , ii],
#regress, extract
lm(y ~ x1 + x2 + x3)$coefficients]})},
by = country]
Comparing the result of this to your coefs.a
and coefs.b
:
country V1 V2 V3 V4 V5 V6
1: a -0.8764867 0.46169717 2.6712128 2.66304537 1.18928600 0.53553900
2: a -1.0135961 0.03985467 0.6015446 0.61316724 0.24177034 0.86369780
3: a -0.1807617 -0.25767309 -2.9492897 -3.05092528 -0.04310375 0.62317993
4: a -0.6664342 -0.30732907 -0.3362091 -0.25776715 1.04419854 1.02294125
5: b 0.9548685 0.77461810 -0.5100818 -0.57726788 -0.73285223 -1.64196684
6: b 0.7179429 0.46107110 0.1732915 0.23262455 0.23258149 3.63679221
7: b 0.1639778 -0.22249382 1.4539881 0.58725270 0.54879762 -0.27115275
8: b 0.6192641 0.12706750 0.2671673 0.79569434 0.69031761 2.27769679
9: c 0.2722200 0.07279085 -0.7709578 -0.74590575 -0.15773196 0.03178821
10: c 0.8890314 0.74213624 0.4440650 0.34939003 0.50531166 0.16550026
11: c 0.1589915 0.20531447 0.9931054 1.25495206 -0.01543296 -0.09887655
12: c 0.7198967 0.70536869 0.4508445 0.02028332 -0.54705588 -0.64246579
> coefs.a
(Intercept) x1 x2 x3
coefs.a -0.8764867 -1.01359605 -0.18076171 -0.6664342
0.4616972 0.03985467 -0.25767309 -0.3073291
2.6712128 0.60154458 -2.94928969 -0.3362091
2.6630454 0.61316724 -3.05092528 -0.2577671
1.1892860 0.24177034 -0.04310375 1.0441985
0.5355390 0.86369780 0.62317993 1.0229412
(i.e. it's the same, just transposed)
答案 1 :(得分:0)
frollapply
仅接受数字矢量输入和输出,因此我们必须在行索引中用sapply()
编写自己的数字。
window <- 180
DT[,
{
data.table(t(sapply(seq_len(.N - window + 1),
function(k) lm(y ~ x1 + x2 + x3,
data = .SD[k:(k + window)])$coefficients)))
},
by = country]
## country (Intercept) x1 x2 x3
## 1: a 0.10163170 0.09561343 -0.11123725 -0.06489867
## 2: a 0.11029460 0.08927926 -0.10657563 -0.06035072
## 3: a 0.11328084 0.08856627 -0.10521865 -0.06278259
## 4: a 0.12348242 0.07503412 -0.10483616 -0.06638923
## 5: a 0.13285512 0.09268086 -0.11239769 -0.04068656
## ---
## 280: d 0.08249204 0.06252626 -0.06965884 -0.09680134
## 281: d 0.07864977 0.05395658 -0.06137728 -0.10774067
## 282: d 0.07937867 0.06996970 -0.07991358 -0.11377039
## 283: d 0.07654691 0.06546692 -0.06824516 -0.10902969
## 284: d 0.06123857 0.08590249 -0.05117317 -0.11728684
``