R中的“滚动”回归

时间:2019-05-07 11:40:21

标签: r regression rolling-computation

说我想按组运行回归分析,因此我想将最近5年的数据用作该回归分析的输入。然后,对于下一年,我想将该回归的输入“转移”一年(即4次观察)。

从这些回归中,我想提取R2和拟合值/残差,然后在遵循类似概念的后续回归中需要它们。

我有一些使用循环的代码,但是对于大型数据集,它并不是很优雅,也不高效。我认为必须有一个不错的解决方案。

# libraries #
library(dplyr)
library(broom)

# reproducible data #    
df <- tibble(ID = as.factor(rep(c(1, 2), each = 40)),
             YEAR = rep(rep(c(2001:2010), each = 4), 2),
             QTR = rep(c(1:4), 20),
             DV = rnorm(80),
             IV = DV * rnorm(80))

# output vector #
output = tibble(ID = NA,
                YEAR = NA,
                R2 = NA)

# loop #
k = 1

for (i in levels(df$ID)){

  n_row = df %>% 
    arrange(ID) %>% 
    filter(ID == i) %>% 
    nrow()

  for (j in seq(1, (n_row - 19), by = 4)){

    output[k, 1] = i
    output[k, 2] = df %>% 
      filter(ID == i) %>%  
      slice((j + 19)) %>% 
      select(YEAR) %>% 
      unlist()

    output[k, 3] = df %>% 
      filter(ID == i) %>%  
      slice(j:(j + 19)) %>% 
      do(model = lm(DV ~ IV, data = .)) %>% 
      glance(model) %>% 
      ungroup() %>% 
      select(r.squared) %>% 
      ungroup()

    k = k + 1
  }
}

1 个答案:

答案 0 :(得分:1)

定义一个函数,该函数在给定unique(TOTALLISTINGS$last_scraped.calc) [1] "2018-08-07" "2019-01-13" "2018-08-15" "2019-01-16" "2018-08-14" "2019-01-15" "2019-01-14" "2019-01-22" [9] "2018-08-22" "2018-08-21" "2019-01-28" "2018-08-20" "2019-01-23" "2019-01-31" "2018-08-09" "2018-08-10" [17] "2018-08-08" "2018-08-16" 行的子集(不包含df的情况下,返回年份和R平方,然后将ID与之配合使用。

rollapply

给予:

library(dplyr)
library(zoo)

R2 <- function(x) {
  x <- as.data.frame(x)
  c(YEAR = tail(x$YEAR, 1), R2 = summary(lm(DV ~ IV, x))$r.squared)
}

df %>%
  group_by(ID) %>%
  do(data.frame(rollapply(.[-1], 20, by = 4, R2, by.column = FALSE))) %>%
  ungroup