如何为不同时间段中包含的数据在绘图上绘制不同的平滑线,而同时在绘图的整个范围上绘制它们呢?
在下面的工作示例中,即使将fullrange
参数设置为TRUE
,平滑线也会限制自身,并且我会收到缺失值警告(这在我们设置新值时确实很有意义)每个geom_smooth()
函数的本地数据范围。)
# convert time series to data.frame, conserving date info
sb <- data.frame(Seatbelts, date = time(Seatbelts))
# convert from ts to date
library(lubridate)
sb$date <- as_date(date_decimal(as.numeric(sb$date)))
# store seatbelt law date
law <- ymd(19830131)
# plot
library(ggplot2)
ggplot(sb) + aes(x = date, y = front) +
geom_line() +
geom_vline(xintercept = law, colour = "red") +
geom_smooth(data = sb[sb$date < law,],
fullrange = TRUE) +
geom_smooth(data = sb[sb$date > law,],
fullrange = TRUE)
警告消息:
Warning messages:
1: Removed 10 rows containing missing values (geom_smooth).
2: Removed 71 rows containing missing values (geom_smooth).
(当前使用ggplot2 3.1.0和R 3.5.2)
修改: 我以为问题是数据的初步子集,所以我也尝试了这个更干净的版本,但无济于事:
# add before/after
sb$relative <- ifelse(sb$date < law, "before", "after")
# plot v.2
ggplot(sb) + aes(x = date, y = front) +
geom_line() +
geom_vline(xintercept = law, colour = "red") +
geom_smooth(aes(colour = relative),
fullrange = TRUE)
答案 0 :(得分:2)
对您所看到的行为的解释与LOESS适合的执行方式有关;默认情况下
dates <- seq(as.Date("1960-01-01"), law, by = "1 day")
head(setNames(predict(
loess(front ~ as.numeric(date), data = sb[sb$date < law, ]),
data.frame(date = as.numeric(dates))), dates))
1960-01-01 1960-01-02 1960-01-03 1960-01-04 1960-01-05 1960-01-06
NA NA NA NA NA NA
?predict.loess
(粗体字)中对此行为的解释
使用“ surface =“ interpolate”“( 默认值),“ predict.loess”不会推断-因此指向外部 包含原始数据的与轴对齐的超立方体将具有 缺少(NA)预测和标准错误。
为了外推到用于LOESS模型的点范围之外的点,我们可以在control = loess.control(surface = "direct")
内使用loess
。
不幸的是,这意味着我们需要手动执行两个LOESS拟合,预测两个感兴趣范围的值并绘制所有内容。
这就是我要做的:
定义便捷函数extrapolate.loess
,以较低/较高的置信区间(基于dates
级别)预测alpha
的值
library(tidyverse)
library(broom)
extrapolate.loess <- function(data, dates, alpha = 0.95) {
loess(
front ~ as.numeric(date), data = data,
control = loess.control(surface = "direct")) %>%
augment(newdata = data.frame(date = as.numeric(dates))) %>%
transmute(
date = dates,
front = .fitted,
front.l = front - qnorm((1 - alpha) / 2) * .se.fit,
front.h = front + qnorm((1 - alpha) / 2) * .se.fit)
}
我们现在将两个范围的CI的LOESS估计值存储在data.frame
dates.left <- seq(as.Date("1960-01-01"), law, by = "1 day")
df.left <- extrapolate.loess(sb[sb$date < law, ], dates.left)
dates.right <- seq(law, as.Date("1990-01-01"), by = "1 day")
df.right <- extrapolate.loess(sb[sb$date > law, ], dates.right)
现在我们可以绘制
ggplot(sb) + aes(x = date, y = front) +
geom_line() +
geom_vline(xintercept = law, colour = "red") +
geom_line(data = df.left, colour = "blue", size = 1) +
geom_ribbon(data = df.left, aes(x = date, ymin = front.l, ymax = front.h), alpha = 0.2) +
geom_line(data = df.right, colour = "blue", size = 1) +
geom_ribbon(data = df.right, aes(x = date, ymin = front.l, ymax = front.h), alpha = 0.2) +
coord_cartesian(ylim = c(400, 1300))
我不会/无法评论这些LOESS推断的有用/意义。