如何改进先知模型

时间:2018-06-06 13:22:32

标签: r time-series facebook-prophet

我正在运行以下R代码:

library(readxl)
library(prophet)
df <- read_excel("20151001-20180531 Data.xlsx", 
                                      col_types = c("date", "numeric", "numeric", 
                                                    "numeric", "numeric"))
h=365
prodf=data.frame(ds=df$Transaction_Date[-((nrow(df)-(h-1)):nrow(df))],
                 y=log(df$`Volume Demand`[-((nrow(df)-(h-1)):nrow(df))]),
                 gasessions=log(df$Sessions[-((nrow(df)-(h-1)):nrow(df))]),
                 avgprice=log(df$`Avg RRP`[-((nrow(df)-(h-1)):nrow(df))]),
                 discdepth=df$`Avg Discount`[-((nrow(df)-(h-1)):nrow(df))])

# allholidays=data.frame(holiday="allholidays",
#                     ds=as.Date(c("2015-10-31","2015-11-26","2015-11-27","2015-11-30","2015-12-18","2015-12-24","2015-12-25","2015-12-26","2015-12-31",
#                          "2016-01-01","2016-02-14","2016-03-25","2016-10-31","2016-11-24","2016-11-25","2016-11-28","2016-12-18","2016-12-24","2016-12-25","2016-12-26","2016-12-31",
#                          "2017-01-01","2017-02-14","2017-04-14","2017-10-31","2017-11-23","2017-11-24","2017-11-27","2017-12-16","2017-12-24","2017-12-25","2017-12-26","2017-12-31",
#                          "2018-01-01","2018-02-14","2018-04-1")),
#                     lower_window = 0,
#                     upper_window = 1, stringsAsFactors = F)
calendar <- read_excel("Calendar.xlsx", 
                 col_types = c("text", "date", "numeric", "numeric"))
m=prophet(holidays = calendar, holidays.prior.scale = 10,changepoint.prior.scale = 0.05)#,mcmc.samples = 300)
m <- add_seasonality(m, name='weekly', period=7, fourier.order=5, prior.scale=10)
m <- add_seasonality(m, name='yearly', period=365, fourier.order=5, prior.scale=10)
m <- add_regressor(m, 'gasessions')
m <- add_regressor(m, 'avgprice')
m <- add_regressor(m, 'discdepth')
m <- fit.prophet(m, prodf)
future <- make_future_dataframe(m, periods = h)
future$gasessions=log(df$Sessions)
future$avgprice=log(df$`Avg RRP`)
future$discdepth=df$`Avg Discount`
forecast <- predict(m, future)
MAPE=mean(abs(tail(exp(forecast$yhat),h)-tail(df$`Volume Demand`,h))/tail(df$`Volume Demand`,h))*100
accuracy=((sum(tail(exp(forecast$yhat),h))-sum(tail(df$`Volume Demand`,h)))/sum(tail(df$`Volume Demand`,h)))*100
TotForecast=sum(tail(exp(forecast$yhat),h))

我试图预测明年的销量。不幸的是,由于显而易见的原因我无法提供数据。 MAPE为19%,准确率为27%。这些是我365天的最佳结果。我已经达到了0.5%的准确度(99.5%正确预测),对于28天的视野有不同的设置,但365天的视野准确度比上述(可能过度拟合)差。有人建议改进这个先知模型吗?此外,如果有一种方法可以转换数据以便它们可以共享,那么请告诉我。 亲切的问候 乔治

1 个答案:

答案 0 :(得分:0)

也许您是对的,该模型可能过拟合,但是您可以通过更改Fourier.order和Seasonity.priorscale来控制它。 Fourier.order:增加它可以使模型适应变化更快的季节性模式。 Seasonality.priorscale:控制模型季节性的正则化量。正则化对于避免过度拟合很重要。 因此,如果出现过度拟合的情况,我建议您同时降低Fourier.order和Seasonity.priorscale。