随机森林 - 插入符号 - 时间序列

时间:2015-07-18 19:03:42

标签: r random-forest forecasting predict r-caret

我有一个时间序列(苹果股价 - 收盘价 - 变成一个数据框,以适应使用插入符号的随机森林。我滞后于1天,2天和6天。我想预测接下来的2天。提前两步预测。caret使用不允许参数predict作为h函数的forecast函数。我看到有些人试图将参数n.ahead但不适用于我。有什么建议吗?请参阅代码

df<-data.frame(APPL)
df$f1<-lag(df$APPL,1)
df$f2=lag(df$APPL,2)
df$f3=lag(df$APPL,6)

# change column names

colnames(df)<-c("price", "price_1", "price_2", "price_6")

# remove rows (days) with NA.
df<-df[complete.cases(df),]

fitControl <- trainControl(
  method = "repeatedcv",
  number = 10,
  repeats = 1,
  classProbs = FALSE,
  verboseIter = TRUE,
  preProcOptions=list(thresh = 0.95, na.remove = TRUE, verbose = TRUE))

set.seed(1234)

rf_grid= expand.grid(mtry = c(1:3))

fit <- train(price~.,
                 data=df,
                 method="rf",
                 preProcess=c("center","scale"),
                 tuneGrid = rf_grid,
                 trControl=fitControl,
                 ntree = 200,
                 metric="RMSE")


nextday <- predict(fit,`WHAT GOES HERE?`)

如果我只将predict(fit)用作newdata整个数据集。我认为这是错误的。我想到的另一件事是做一个循环。预测前进一步,因为我有1,2和6天前的数据。并且前面两步的填充预测了1天前的“单元格”和我之前做过的预测。

2 个答案:

答案 0 :(得分:2)

目前,您无法将其他选项传递给基础预测方法。虽然有一个proposed change可能会启用此功能。

在您的情况下,您应该为预测函数提供一个数据框,该数据框具有适合下几个观察的适当预测因子。

答案 1 :(得分:-1)

#1:: colnames(df)<-c("price","price_1","price_2","price_6") ;; "after price6
#2:: Predict{stats} is a generic function for predictions from the results of various model fitting functions

::predict(model object , dataframe)
we have 3 cases here for dataframe ::
case 1 :: train data::on which model is fitted :: Insample prediction
case 2 :: test data::Out of sample prediction
case 3 :: forecasted  data :: forecasted values of the independent variables : we get the forecasted values of the dependent variable according to the model

The column names in case 2 & 3 should be same as column names of the train data