Predicting dates using dates

时间:2017-06-12 16:46:36

标签: r machine-learning statistics

I want to create a simple model that I can use to predict future dates. All I want to use is a list of dates and use that to best predict the future dates. Here is what I've done so far:

sales_modified = data.frame(City=sales$City, SOCreatedOn=sales$SOCreatedOn)
sales_modified = sales_modified[order(sales_modified$City,sales_modified$SOCreatedOn),]
sales_modified = unique(sales_modified)
sales_modified$rowNum = 1:length(sales_modified$City)
ggplot(data = sales_modified[1:119,], aes(x=rowNum, y=SOCreatedOn)) + geom_point(aes(color=City)) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + facet_wrap(~City)+ geom_smooth(method="lm")

date1 = lm(SOCreatedOn ~ rowNum, data = sales_modified[1:119,])

That formats the date and keeps it in a data frame called sales_modified which creates two columns, a City and a Date. The reason I created a rowNum column was to plot it somehow neatly and also to create the model. I tried using lm() to figure out the linear model. How can I find the next future dates without providing any additional data?

EDIT: Experimenting with ARIMA model, but it leads to just picking dates right after each other (4/27, 4/28, 4/29) when I know this won't be the case.

timeseries = ts(sales_modified$SOCreatedOn[1:119])
plot.ts(timeseries)

auto.arima(timeseries)
timeseriesarima = arima(timeseries, order = c(2,1,0))
timeseriesforecast = forecast.Arima(timeseriesarima, h = 5)
as.Date(timeseriesforecast$mean[1:5])

2 个答案:

答案 0 :(得分:0)

Time series data violates the assumption that adjacent data point are independent of each other, a simple linear regression will not correctly account for this. Consider an ARIMA model (?arima)

答案 1 :(得分:0)

虽然我确实分享@ Rob对时间序列违反独立性的担忧,但很可能只有你能够最好地评估这一点(一个数据点是否为你提供有关下一个数据点的信息)。

话虽如此,如果您对其适当性感到满意,那么预测lm模型的新响应非常容易。查看?predict.lm,然后尝试

newdate <- predict(date1, newdata=data.frame(rowNum=120:140))

如果您发现使用ARRA模型更合适@Rob建议,请查看?predict.Arima。用法类似于predict(yourARIMAmodel)