I want to create a simple model that I can use to predict future dates. All I want to use is a list of dates and use that to best predict the future dates. Here is what I've done so far:
sales_modified = data.frame(City=sales$City, SOCreatedOn=sales$SOCreatedOn)
sales_modified = sales_modified[order(sales_modified$City,sales_modified$SOCreatedOn),]
sales_modified = unique(sales_modified)
sales_modified$rowNum = 1:length(sales_modified$City)
ggplot(data = sales_modified[1:119,], aes(x=rowNum, y=SOCreatedOn)) + geom_point(aes(color=City)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + facet_wrap(~City)+ geom_smooth(method="lm")
date1 = lm(SOCreatedOn ~ rowNum, data = sales_modified[1:119,])
That formats the date and keeps it in a data frame called sales_modified
which creates two columns, a City and a Date. The reason I created a rowNum
column was to plot it somehow neatly and also to create the model. I tried using lm()
to figure out the linear model. How can I find the next future dates without providing any additional data?
EDIT: Experimenting with ARIMA model, but it leads to just picking dates right after each other (4/27, 4/28, 4/29) when I know this won't be the case.
timeseries = ts(sales_modified$SOCreatedOn[1:119])
plot.ts(timeseries)
auto.arima(timeseries)
timeseriesarima = arima(timeseries, order = c(2,1,0))
timeseriesforecast = forecast.Arima(timeseriesarima, h = 5)
as.Date(timeseriesforecast$mean[1:5])
答案 0 :(得分:0)
Time series data violates the assumption that adjacent data point are independent of each other, a simple linear regression will not correctly account for this. Consider an ARIMA model (?arima)
答案 1 :(得分:0)
虽然我确实分享@ Rob对时间序列违反独立性的担忧,但很可能只有你能够最好地评估这一点(一个数据点是否为你提供有关下一个数据点的信息)。
话虽如此,如果您对其适当性感到满意,那么预测lm
模型的新响应非常容易。查看?predict.lm
,然后尝试
newdate <- predict(date1, newdata=data.frame(rowNum=120:140))
如果您发现使用ARRA模型更合适@Rob建议,请查看?predict.Arima
。用法类似于predict(yourARIMAmodel)
。