R:如何在一个图中显示预测和实际数据?

时间:2019-05-17 06:25:28

标签: r ggplot2 forecast

我有一些2000年第一季度到2010年第四季度的时间序列数据。我已使用HoltWinters从2000年第一季度到2008年第二季度的数据预测了未来10个季度。

CPI.HI.fit <- HoltWinters(CPI.HI.pre, gamma=FALSE)
CPI.HI.cfr <- forecast(CPI.HI.fit, 10)

这是数据-

  1. CPI.HI.prets类的先前时间序列)
  2. CPI.HI.posts类的后时间序列)
  3. CPI.HI.cfrforecast类的危机预测)
> CPI.HI.pre
#          Qtr1     Qtr2     Qtr3     Qtr4
# 2000 83.12262 83.72945 84.10338 84.58881
# 2001 85.03111 85.92120 85.86388 85.74424
# 2002 86.01310 86.89452 87.05565 87.31702
# 2003 87.93231 88.23959 88.43708 88.56572
# 2004 89.02891 90.05139 90.17285 90.68677
# 2005 90.82155 91.74464 92.18774 92.57043
# 2006 92.91782 94.15888 94.58178 94.13807
# 2007 94.58282 95.99794 96.12194 97.08308
# 2008 97.72470 99.54615                  
> CPI.HI.pos
#           Qtr1      Qtr2      Qtr3      Qtr4
# 2008                     100.39960  99.11151
# 2009  98.79588  99.36900  99.75832  99.90321
# 2010 100.17990 100.96250 100.99250 101.40690
> CPI.HI.cfr
#         Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
# 2008 Q3       99.86646  99.26724 100.4657  98.95002 100.7829
# 2008 Q4      100.69200  99.93567 101.4483  99.53529 101.8487
# 2009 Q1      101.51754 100.57777 102.4573 100.08028 102.9548
# 2009 Q2      102.34308 101.19808 103.4881 100.59195 104.0942
# 2009 Q3      103.16862 101.79962 104.5376 101.07492 105.2623
# 2009 Q4      103.99416 102.38447 105.6038 101.53236 106.4560
# 2010 Q1      104.81970 102.95412 106.6853 101.96654 107.6729
# 2010 Q2      105.64524 103.50968 107.7808 102.37918 108.9113
# 2010 Q3      106.47077 104.05204 108.8895 102.77163 110.1699
# 2010 Q4      107.29631 104.58191 110.0107 103.14499 111.4476

我能够使用一个图获得以前的数据并在一个图中进行预测

> autoplot(CPI.HI.cfr)

the previous and forecast plots

以及预测期的实际数据(在单独的图中)

> autoplot(CPI.HI.pos)

the posterior plot

我希望两个人都在同一块土地上。

我知道最好用ggplot()完成,但是尝试了几种方法之后 例如

ggplot(aes(x=x, y=y), data=CPI.HI.pre) + 
  geom_line(CPI.HI.pos)

事情开始让我感到困惑!

1 个答案:

答案 0 :(得分:2)

因此,我发现您的问题不太容易重现,下次您可能考虑使用dput()发布数据片段。我认为这是因为我必须按照以下方式处理复制粘贴的数据,才能获得类似于您的输入的内容:

zz <- "          Qtr1     Qtr2     Qtr3     Qtr4
 2000 83.12262 83.72945 84.10338 84.58881
 2001 85.03111 85.92120 85.86388 85.74424
 2002 86.01310 86.89452 87.05565 87.31702
 2003 87.93231 88.23959 88.43708 88.56572
 2004 89.02891 90.05139 90.17285 90.68677
 2005 90.82155 91.74464 92.18774 92.57043
 2006 92.91782 94.15888 94.58178 94.13807
 2007 94.58282 95.99794 96.12194 97.08308
2008 97.72470 99.54615 NA NA"

yy <- "           Qtr1      Qtr2      Qtr3      Qtr4
 2008  NA        NA         100.39960  99.11151
 2009  98.79588  99.36900  99.75832  99.90321
 2010 100.17990 100.96250 100.99250 101.40690"

qq <- "Year Qtr        PointForecast     Lo80    Hi80     Lo95    Hi95
 2008 Q3       99.86646  99.26724 100.4657  98.95002 100.7829
 2008 Q4      100.69200  99.93567 101.4483  99.53529 101.8487
 2009 Q1      101.51754 100.57777 102.4573 100.08028 102.9548
 2009 Q2      102.34308 101.19808 103.4881 100.59195 104.0942
 2009 Q3      103.16862 101.79962 104.5376 101.07492 105.2623
 2009 Q4      103.99416 102.38447 105.6038 101.53236 106.4560
 2010 Q1      104.81970 102.95412 106.6853 101.96654 107.6729
 2010 Q2      105.64524 103.50968 107.7808 102.37918 108.9113
 2010 Q3      106.47077 104.05204 108.8895 102.77163 110.1699
 2010 Q4      107.29631 104.58191 110.0107 103.14499 111.4476"

CPI.HI.pre <- read.table(text = zz, header = T)
CPI.HI.pre$year <- rownames(CPI.HI.pre)

CPI.HI.pos <- read.table(text = yy, header = T)
CPI.HI.pos$year <- rownames(CPI.HI.pos)

CPI.HI.cfr <- read.table(text = qq, header = T)

我已将行名复制到CPI.HI.preCPI.HI.pos的实际变量中。我还向Year添加了QtrCPI.HI.cfr姓氏,并用NA填补了所有空白。接下来,我将数据从长格式转换为宽格式:

df1 <- reshape2::melt(CPI.HI.pre, id.vars = "year")
df2 <- reshape2::melt(CPI.HI.pos, id.vars = "year")

# data of origin saved as an extra column
df <- rbind(cbind(df1, data = "CPI.HI.pre"),
            cbind(df2, data = "CPI.HI.pos"))
df <- df[!is.na(df$value),]
# CPI.HI.cfr is already in long format, but wanted to have a shorter variable
fc <- CPI.HI.cfr

然后,我将年四分之一对转换为可由ggplot轻松解释的某个数值。我敢肯定有人有更好的主意,例如使用lubridate软件包进行日期格式转换,但是我对此并不精通。

df$x <- as.numeric(df$year) + (as.numeric(factor(df$variable), levels = paste0("Qrt", 1:4)))/4
fc$x <- as.numeric(fc$Year) + (as.numeric(factor(fc$Qtr), levels = paste0("Q", 1:4)))/4

最后,我们可以绘制数据。对于80%和95%的置信区间,我们使用两个透明的geom_ribbons;对于预测点和实际点,使用两条线。

ggplot(df) +
  geom_ribbon(data = fc, aes(x, ymin = Lo95, ymax = Hi95), fill = "blue", alpha = 0.25) +
  geom_ribbon(data = fc, aes(x, ymin = Lo80, ymax = Hi80), fill = "blue", alpha = 0.25) +
  geom_line(data = fc, aes(x, PointForecast), colour = "blue") +
  geom_line(aes(x, value))

看起来像这样:

enter image description here