我正在尝试使用另一个时间序列(X)作为预测器来预播时间序列数据(Y)。 X和Y是协整的。 Y是2012年1月至2016年10月的月度数据,X是2012年1月至2017年2月的。
所以,我按照此视频中的显示运行了VECM:https://www.youtube.com/watch?v=x9DcUA9puY0
为了获得预测值,我按照vec2var
命令将其转换为VAR,遵循以下主题中的信息:https://stats.stackexchange.com/questions/223888/how-to-forecast-from-vecm-in-r
但是我不能用已知的X来预测Y,如何使用线性回归模型使用predict
函数进行预测。另外,我无法获得建模的Y(Y帽)值。
这是我的代码:
# Cointegrated_series is a ZOO object, which contains two time series X and Y
library("zoo")
library("xts")
library("urca")
library("vars")
# Obtain lag length
Lagl <- VARselect(Cointegrated_series)$selection[[1]]
#Conduct Eigen test
cointest <- ca.jo(Cointegrated_series,K=Lagl,type = "eigen", ecdet = "const",
spec = "transitory")
#Fit VECM
vecm <- cajorls(cointest)
#Transform VECM to VAR
var <- vec2var(cointest)
我尝试以不同的方式使用predict
功能:predict(var)
,predict(var, newdata = 50)
,predict(var, newdata = 1000)
- 结果相同。
尝试在tsDyn
方法中使用newdata
包和predict
参数,如下所述:https://stats.stackexchange.com/questions/261849/prediction-from-vecm-in-r-using-external-forecasts-of-regressors?rq=1
不工作。我的新数据是ZOO对象,其中X系列的值从2016年11月到2017年2月,Y系列是NA。因此,该方法返回预测中的NA:
# Cointegrated_series is a ZOO object, which contains
#two time series X and Y from Jan 2012 to Oct 2016. Both X and Y are values.
# newDat is a ZOO object, which contains two time series
#X and Y from Nov 2016 to Feb 2017. X are values, Y are NAs.
library(tsDyn)
vecm <-VECM(Cointegrated_series, lag=2)
predict(vecm,newdata = newDat, n.ahead=5)
这是一个结果:
Y X
59 NA NA
60 NA NA
61 NA NA
62 NA NA
63 NA NA
例如,这是我在调用predict
whithout newdata
参数后得到的结果:
predict(vecm, n.ahead=5)
Y X
59 65.05233 64.78006
60 70.54545 73.87368
61 75.65266 72.06513
62 74.76065 62.97242
63 70.03992 55.81045
所以,我的主要问题是:
除此之外,我也找不到这些问题的答案:
如何在R中调用VECM的Akaike标准(AIC)?
vars和urca包是否为VECM提供F和t统计信息?
UPD 10.04.2017 我稍微编辑了这个问题。注意到,我的问题适用于“乱边”问题,将其称为“预测”是不正确的 - 它是“临近预报”。
UPD 11.04.2017
感谢您的回答!
以下是完整代码:
library("lubridate")
library("zoo")
library("xts")
library("urca")
library("vars")
library("forecast")
Dat <- dget(file = "https://getfile.dokpub.com/yandex/get/https://yadi.sk/d/VJpQ75Rz3GsDKN")
NewDat <- dget(file = "https://getfile.dokpub.com/yandex/get/https://yadi.sk/d/T7qxxPUq3GsDLc")
Lagl <- VARselect(Dat)$selection[[1]]
#vars package
cointest_e <- ca.jo(Dat,K=Lagl,type = "eigen", ecdet = "const",
spec = "transitory")
vecm <- cajorls(cointest_e)
var <- vec2var(cointest_e)
Predict1 <- predict(var)
Predict2 <- predict(var, newdata = NewDat)
Predict1$fcst$Y
Predict2$fcst$Y
Predict1$fcst$Y == Predict2$fcst$Y
Predict1$fcst$X == Predict2$fcst$X
#As we see, Predict1 and Predict2 are similar, so the information in NewDat
#didn't came into account.
library("tsDyn")
vecm2 <-VECM(Dat, lag=3)
predict(vecm2)
predict(vecm2, newdata=NewDat)
如果dget
会返回错误,请在此处下载我的数据:
https://yadi.sk/d/VJpQ75Rz3GsDKN - 对于Dat
https://yadi.sk/d/T7qxxPUq3GsDLc - 适用于NewDat
关于临近预报
说 Nowcasting 我的意思是当前月或上个月对当前可用数据的不可用数据的预测。以下是一些参考:
Gianonne,Reichlin,Small:Nowcasting:宏观经济数据的实时信息内容(2008)
Now-Casting和实时数据流(2013)
Marcellino,Schumacher:利用Ragged-Edge数据进行临近预测和预测的因子MIDAS:德国GDP的模型比较(2010)
答案 0 :(得分:2)
我觉得你的问题更多的是关于如何对协整变量进行临近预测,然后让我们看看如何在R中实现它。
一般来说,根据格兰杰的表示定理,协整变量可以用多种形式表示:
长期关系:y和x的同期价值
VECM表示:(diff)y和x由(diff)滞后解释,以及前一周期的纠错项。
所以我不确定你如何在VECM表示中进行临近预报,因为它只包含过去的值?我可以看到两种可能性:
根据长期关系进行临近预报。所以你只需要运行标准的OLS,并从那里进行预测。
基于结构VECM进行临近预测,您可以在其中添加您知道的变量(X)的同期值。在R中,您将执行此程序包urca
,但您需要检查predict
函数是否允许您添加已知的X值。
关于长期关系方法,有趣的是,你可以根据VECM(没有已知的X)和已知X的LT获得X和Y的预测。这为你提供了一个有想法的方法模型的准确性(比较已知和预测的X),您可以用它来为Y创建预测平均方案?
答案 1 :(得分:0)
首先,非常感谢@Matifou的出色软件包。 我的回复很晚,但是我也想找出相同的问题,但没有找到解决方案。因此,我实现了以下功能,希望对某些人有用:
#' @title Special predict method for VECM models
#' @description Predict method for VECM models given some known endogenus
#' variables are known but one. It is just valid for one cointegration equation by the moment
#' @param object, an object of class ‘VECM’
#' @param new_data, a dataframe containing the forecast of all the endogenus variables
#' but one, if there are exogenus variables, its forecast must be provided.
#' @param predicted_var, a string with the desired endogenus variable to be predicted.
#' @return A list with the predicted variable, predicted values and a dataframe with the
#' detailed values used for the construction of the forecast.
#' @examples
#' data(zeroyld)
#' # Fit a VECM with Johansen MLE estimator:
#' vecm.jo<-VECM(zeroyld, lag=2, estim="ML")
#' predict.vecm(vecm.jo, new_data = data.frame("long.run" = c(7:10)), predicted_var = "short.run")
predict.vecm <- function(object, new_data, predicted_var){
if (inherits(object, "VECM")) {
# Just valid for VECM models
# Get endogenus and exogenus variables
summary_vecm <- summary(object)
model_vars <- colnames(object$model)
endovars <- sub("Equation ", "", rownames(summary_vecm$bigcoefficients))
if (!(predicted_var %in% endovars)) {
stop("You must provide a valid endogenus variable.")
}
exovars <- NULL
if (object$exogen) {
ind_endovars <-
unlist(sapply(endovars, function(x) grep(x, model_vars), simplify = FALSE))
exovars <- model_vars[-ind_endovars]
exovars <- exovars[exovars != "ECT"]
}
# First step: join new_data and (lags + 1) last values from the calibration data
new_data <- data.frame(new_data)
if (!all(colnames(new_data) %in% c(endovars, exovars))) {
stop("new_data must have valid endogenus or exogenus column names.")
}
# Endovars but the one desired to be predicted
endovars2 <- endovars[endovars != predicted_var]
# if (!all(colnames(new_data) %in% c(endovars2, exovars))) {
# stop("new_data must have valid endogenus (all but the one desired to predict) or exogenus column names.")
# }
new_data <- new_data[, c(endovars2, exovars), drop = FALSE]
new_data <- cbind(NA, new_data)
colnames(new_data) <- c(predicted_var, endovars2, exovars)
# Previous values to obtain lag values and first differences (lags + 1)
dt_tail <- data.frame(tail(object$model[, c(endovars, exovars), drop = FALSE], object$lag + 1))
new_data <- rbind(dt_tail, new_data)
# Second step: get long rung relationship forecast (ECT term)
ect_vars <- rownames(object$model.specific$beta)
if ("const" %in% ect_vars) {
new_data$const <- 1
}
ect_coeff <- object$model.specific$beta[, 1]
new_data$ECT_0 <-
apply(sweep(new_data[, ect_vars], MARGIN = 2, ect_coeff, `*`), MARGIN = 1, sum)
# Get ECT-1 (Lag 1)
new_data$ECT <- as.numeric(quantmod::Lag(new_data$ECT_0, 1))
# Third step: get differences of the endogenus and exogenus variables provided in new_data
diff_data <- apply(new_data[, c(endovars, exovars)], MARGIN = 2, diff)
colnames(diff_data) <- paste0("DIFF_", c(endovars, exovars))
diff_data <- rbind(NA, diff_data)
new_data <- cbind(new_data, diff_data)
# Fourth step: get x lags of the endogenus and exogenus variables
for (k in 1:object$lag) {
iter <- myLag(new_data[, paste0("DIFF_", endovars)], k)
colnames(iter) <- paste0("DIFF_", endovars, " -", k)
new_data <- cbind(new_data, iter)
}
# Fifth step: recursive calculatioon
vecm_vars <- colnames(summary_vecm$bigcoefficients)
if ("Intercept" %in% vecm_vars) {
new_data$Intercept <- 1
}
vecm_vars[!(vecm_vars %in% c("ECT", "Intercept"))] <-
paste0("DIFF_", vecm_vars[!(vecm_vars %in% c("ECT", "Intercept"))])
equation <- paste("Equation", predicted_var)
equation_coeff <- summary_vecm$coefficients[equation, ]
predicted_var2 <- paste0("DIFF_", predicted_var)
for (k in (object$lag + 2):nrow(new_data)) {
# Estimate y_diff
new_data[k, predicted_var2] <-
sum(sweep(new_data[k, vecm_vars], MARGIN = 2, equation_coeff, `*`))
# Estimate y_diff lags
for (j in 1:object$lag) {
new_data[, paste0(predicted_var2, " -", j)] <-
as.numeric(quantmod::Lag(new_data[, predicted_var2], j))
}
# Estimate y
new_data[k, predicted_var] <-
new_data[(k - 1), predicted_var] + new_data[k, predicted_var2]
# Estimate ECT
new_data[k, "ECT_0"] <- sum(sweep(new_data[k, ect_vars], MARGIN = 2, ect_coeff, `*`))
if (k < nrow(new_data)) {
new_data[k + 1, "ECT"] <- new_data[k, "ECT_0"]
}
}
predicted_values <- new_data[(object$lag + 2):nrow(new_data), predicted_var]
} else {
stop("You must provide a valid VECM model.")
}
return(
list(
predicted_variable = predicted_var,
predicted_values = predicted_values,
data = new_data
)
)
}
# Lag function applied to dataframes
myLag <- function(data, lag) data.frame(unclass(data[c(rep(NA, lag), 1:(nrow(data)-`lag)),]))`
@Andrey Goloborodko,在您的示例中,您应申请:
NewDat <- NewDat[-1,] #Just new data is necessary to be provided
predict.vecm(vecm2, new_data=NewDat, predicted_var = "Y")
# $predicted_variable
# [1] "Y"
#
# $predicted_values
# [1] 65.05233 61.29563 59.45109
#
# $data
# Y X ECT_0 ECT DIFF_Y DIFF_X DIFF_Y -1 DIFF_X -1 DIFF_Y -2
# jul. 2016 92.40506 100 -29.0616718 NA NA NA NA NA NA
# ago. 2016 94.03255 78 -0.7115037 -29.0616718 1.627486 -22 NA NA NA
# sep. 2016 78.84268 53 14.4653067 -0.7115037 -15.189873 -25 1.627486 -22 NA
# oct. 2016 67.99277 52 4.8300645 14.4653067 -10.849910 -1 -15.189873 -25 1.627486
# nov. 2016 65.05233 51 3.1042967 4.8300645 -2.940435 -1 -10.849910 -1 -15.189873
# dic. 2016 61.29563 50 0.5622618 3.1042967 -3.756702 -1 -2.940435 -1 -10.849910
# ene. 2017 59.45109 55 -7.3556104 0.5622618 -1.844535 5 -3.756702 -1 -2.940435
# DIFF_X -2 DIFF_Y -3 DIFF_X -3 Intercept
# jul. 2016 NA NA NA 1
# ago. 2016 NA NA NA 1
# sep. 2016 NA NA NA 1
# oct. 2016 -22 NA NA 1
# nov. 2016 -25 1.627486 -22 1
# dic. 2016 -1 -15.189873 -25 1
# ene. 2017 -1 -10.849910 -1 1