设置数据子集时,在多条黄土平滑线上应用全范围选项

时间:2019-02-20 00:45:24

标签: r ggplot2

如何为不同时间段中包含的数据在绘图上绘制不同的平滑线,而同时在绘图的整个范围上绘制它们呢?

在下面的工作示例中,即使将fullrange参数设置为TRUE,平滑线也会限制自身,并且我会收到缺失值警告(这在我们设置新值时确实很有意义)每个geom_smooth()函数的本地数据范围。)

# convert time series to data.frame, conserving date info
sb <- data.frame(Seatbelts, date = time(Seatbelts))
# convert from ts to date
library(lubridate)
sb$date <- as_date(date_decimal(as.numeric(sb$date)))
# store seatbelt law date
law <- ymd(19830131)
# plot
library(ggplot2)
ggplot(sb) + aes(x = date, y = front) +
  geom_line() +
  geom_vline(xintercept = law, colour = "red") +
  geom_smooth(data = sb[sb$date < law,],
              fullrange = TRUE) +
  geom_smooth(data = sb[sb$date > law,],
              fullrange = TRUE)

current result: smooth lines don't span the whole range

警告消息:

Warning messages:
1: Removed 10 rows containing missing values (geom_smooth). 
2: Removed 71 rows containing missing values (geom_smooth). 

(当前使用ggplot2 3.1.0和R 3.5.2)

修改: 我以为问题是数据的初步子集,所以我也尝试了这个更干净的版本,但无济于事:

# add before/after
sb$relative <- ifelse(sb$date < law, "before", "after")
# plot v.2
ggplot(sb) + aes(x = date, y = front) +
  geom_line() +
  geom_vline(xintercept = law, colour = "red") +
  geom_smooth(aes(colour = relative),
              fullrange = TRUE)

1 个答案:

答案 0 :(得分:2)

对您所看到的行为的解释与LOESS适合的执行方式有关;默认情况下

dates <- seq(as.Date("1960-01-01"), law, by = "1 day")
head(setNames(predict(
    loess(front ~ as.numeric(date), data = sb[sb$date < law, ]),
    data.frame(date = as.numeric(dates))), dates))
1960-01-01 1960-01-02 1960-01-03 1960-01-04 1960-01-05 1960-01-06
        NA         NA         NA         NA         NA         NA

?predict.loess(粗体字)中对此行为的解释

  

使用“ surface =“ interpolate”“(        默认值),“ predict.loess”不会推断-因此指向外部        包含原始数据的与轴对齐的超立方体将具有        缺少(NA)预测和标准错误。

为了外推到用于LOESS模型的点范围之外的点,我们可以在control = loess.control(surface = "direct")内使用loess

不幸的是,这意味着我们需要手动执行两个LOESS拟合,预测两个感兴趣范围的值并绘制所有内容。

这就是我要做的:

  1. 定义便捷函数extrapolate.loess,以较低/较高的置信区间(基于dates级别)预测alpha的值

    library(tidyverse)
    library(broom)
    extrapolate.loess <- function(data, dates, alpha = 0.95) {
        loess(
            front ~ as.numeric(date), data = data,
            control = loess.control(surface = "direct")) %>%
            augment(newdata = data.frame(date = as.numeric(dates))) %>%
            transmute(
                date = dates,
                front = .fitted,
                front.l = front - qnorm((1 - alpha) / 2) * .se.fit,
                front.h = front + qnorm((1 - alpha) / 2) * .se.fit)
    }
    
  2. 我们现在将两个范围的CI的LOESS估计值存储在data.frame

    dates.left <- seq(as.Date("1960-01-01"), law, by = "1 day")
    df.left <- extrapolate.loess(sb[sb$date < law, ], dates.left)
    
    dates.right <- seq(law, as.Date("1990-01-01"), by = "1 day")
    df.right <- extrapolate.loess(sb[sb$date > law, ], dates.right)
    
  3. 现在我们可以绘制

    ggplot(sb) + aes(x = date, y = front) +
        geom_line() +
        geom_vline(xintercept = law, colour = "red") +
        geom_line(data = df.left, colour = "blue", size = 1) +
        geom_ribbon(data = df.left, aes(x = date, ymin = front.l, ymax = front.h), alpha = 0.2) +
        geom_line(data = df.right, colour = "blue", size = 1) +
        geom_ribbon(data = df.right, aes(x = date, ymin = front.l, ymax = front.h), alpha = 0.2) +
        coord_cartesian(ylim = c(400, 1300))
    

enter image description here

我不会/无法评论这些LOESS推断的有用/意义。