我打算用ggplot2创建一个图表。它应该显示过去五年到目前为止的变量的时间序列的过去过程。我通过三种不同的预测方法为下一年生成了三种不同的时间点预测。 我正在考虑的图表显示了一种颜色(黑色)的过去值和三种不同颜色的点预测时间序列。
这是我的数据和方法:
ae<-ts(c(4.670958, 4.606170, 4.610158, 4.697749, 4.685828, 4.581902, 4.676560, 4.662495, 4.737951, 4.697749, 4.643429, 4.740575, 4.714921, 4.597138, 4.709530, 4.727388, 4.723842, 4.655863, 4.732684, 4.724729, 4.762174, 4.727388, 4.682131, 4.695011, 4.783316, 4.572647, 4.734443, 4.759607, 4.715817, 4.720283, 4.719391, 4.714025, 4.843399, 4.758749, 4.682131, 4.841033, 4.783316, 4.603168, 4.735321, 4.751865, 4.761319, 4.719391, 4.689511, 4.742320, 4.834693, 4.763028, 4.704110, 4.821893, 4.703204, 4.660605, 4.752728, 4.734443, 4.789989, 4.830711, 4.758749, 4.771532, 4.935193, 4.728272, 4.809742, 4.838660, 4.763028), start=c(2012,7), frequency=12)
af<-ts(c(4.735572, 4.786397, 4.794226, 4.847278, 4.828640, 4.831721, 4.828364, 4.917734, 4.843730, 4.817140, 4.907995, 4.846953), start=c(2017,8), frequency=12)
bf<-ts(c(4.731111, 4.802771, 4.789276, 4.855957, 4.787150, 4.839004, 4.815918, 4.910693, 4.831316, 4.804971, 4.894336, 4.837539), start=c(2017,8), frequency=12)
cf<-ts(c(4.734454, 4.786685, 4.796952, 4.849983, 4.831067, 4.833924, 4.831631, 4.924311, 4.847889, 4.820325, 4.914030, 4.851841), start=c(2017,8), frequency=12)
month2<-seq(as.Date('2012-7-1'),to=as.Date('2018-7-1'),by='month')
al<-binder(window(ae, c(2012,7), c(2017,7)),af)
bl<-binder(window(ae, c(2012,7), c(2017,7)),bf)
cl<-binder(window(ae, c(2012,7), c(2017,7)),cf)
df2<-data.frame(month2,al,bl,cl)
meltdf <- reshape2::melt(df2,id="month2")
ggplot(meltdf,aes(x=month2,y=value,colour=variable,group=variable)) + geom_line()
我不喜欢这种方法,因为过去的值与最后一点预测的颜色相同。什么是完全有道理的,因为这就是代码所说的。有没有办法让它变得更好?这就是它现在的样子:
非常感谢。 朱莉娅
答案 0 :(得分:0)
如果我是你,我会将观察到的数据和预测数据视为两个独立的数据体,每个数据都在不同的时间范围内。然后你得到两个data.frame
s,一个用于观察,一个用于预测。它看起来像这样:
library(ggplot2)
# OP's data
ae <- ts(c(4.670958, 4.606170, 4.610158, 4.697749, 4.685828, 4.581902,
4.676560, 4.662495, 4.737951, 4.697749, 4.643429, 4.740575,
4.714921, 4.597138, 4.709530, 4.727388, 4.723842, 4.655863,
4.732684, 4.724729, 4.762174, 4.727388, 4.682131, 4.695011,
4.783316, 4.572647, 4.734443, 4.759607, 4.715817, 4.720283,
4.719391, 4.714025, 4.843399, 4.758749, 4.682131, 4.841033,
4.783316, 4.603168, 4.735321, 4.751865, 4.761319, 4.719391,
4.689511, 4.742320, 4.834693, 4.763028, 4.704110, 4.821893,
4.703204, 4.660605, 4.752728, 4.734443, 4.789989, 4.830711,
4.758749, 4.771532, 4.935193, 4.728272, 4.809742, 4.838660,
4.763028),
start=c(2012,7), frequency=12)
af <- ts(c(4.735572, 4.786397, 4.794226, 4.847278, 4.828640, 4.831721,
4.828364, 4.917734, 4.843730, 4.817140, 4.907995, 4.846953),
start=c(2017,8), frequency=12)
bf <- ts(c(4.731111, 4.802771, 4.789276, 4.855957, 4.787150, 4.839004,
4.815918, 4.910693, 4.831316, 4.804971, 4.894336, 4.837539),
start=c(2017,8), frequency=12)
cf <- ts(c(4.734454, 4.786685, 4.796952, 4.849983, 4.831067, 4.833924,
4.831631, 4.924311, 4.847889, 4.820325, 4.914030, 4.851841),
start=c(2017,8), frequency=12)
# Create separate month series, one for observed, one for forecasts
month1 <- seq(as.Date('2012-7-1'), to=as.Date('2017-7-1'), by='month')
month2 <- seq(as.Date('2017-8-1'), to=as.Date('2018-7-1'), by='month')
# data.frame for observed data
df1 <- data.frame(month=month1, dat=as.vector(ae))
# data.frame for forecast data with a label column to distinguish which
# forecast it is
df2 <- data.frame(month=month2, dat=as.vector(af), lab='a')
df3 <- data.frame(month=month2, dat=as.vector(bf), lab='b')
df4 <- data.frame(month=month2, dat=as.vector(cf), lab='c')
df_forcast <- rbind(df2, df3, df4)
# The plot then plots the data with two line geometries
ggplot(df1, aes(x=month, y=dat)) +
geom_line() + # this plots the observed data
geom_line(data=df_forcast, aes(col=lab)) # this plots the forecasts
结果图如下所示:
正如您在问题中所述,您需要在实际数据集中使用某些内容来绘制,以指示不同的数据点对应不同的内容。当然,还有其他方法可以实现这一目标,但这种方法为您提供了最终产品可视化的灵活性。
(注意:观察到的数据与预测之间存在差距;这是因为观测数据在预测开始之前结束。如果您希望线路连接,您可以简单地复制观测数据的最后一个数据点作为每个预测的第一个数据点。这是我为你留下的一个练习。)