我希望ggplot()标记残差高于回归标准误差1.5倍的观察值。数据是这些数据(来自Frank 1984):
d <- data.frame(x=c(43,32,32,30,26,25,23,22,22,21,20,20,19,19,19,18,18,17,17,16,16,16,15,13,12,12,10,10,9,7,6,3), y=c(63.0,54.3,51.0,39.0,52.0,55.0,41.2,47.7,44.5,43.0,46.8,42.4,56.5,55.0,53.0,55.0,45.0,50.7,37.5,61.0,48.1,30.0,51.5,40.6,51.3,50.3,62.4,39.3,43.2,40.4,37.7,27.7))
模型很简单:
m <- lm(y~x,data=d)
然后ggplot()是:
ggplot(d, aes(x=x, y=y)) + geom_point() + geom_text(label=ifelse(abs(resid(m))>(1.5*sigma(m)),rownames(d),""),
nudge_x = 1, nudge_y = 0, check_overlap = T, color="blue")
给予这个情节
在左上角缺少观察标签(obs#27)。比较:
abs(resid(m))>(1.5*sigma(m))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
正确表示27满足条件。为什么没有标签?
答案 0 :(得分:1)
geom_text
中的标签不在aes
内,尽管我不确定为什么您仍然在没有标签的情况下仍能部分使用标签。
我包括一些中间步骤,以便更慢地完成此操作;对我来说,这有助于调试和调查工作原理。随时冷凝。
分配d
和m
与OP相同。通过额外的步骤:
library(tidyverse)
d2 <- d %>%
mutate(row = row_number()) %>%
mutate(abs_resid = abs(resid(m)), sig = sigma(m)) %>%
mutate(is_outlier = abs_resid > 1.5 * sig) %>%
mutate(label = ifelse(is_outlier, row, ""))
head(d2)
#> x y row abs_resid sig is_outlier label
#> 1 43 63.0 1 4.8398378 7.934235 FALSE
#> 2 32 54.3 2 0.9561793 7.934235 FALSE
#> 3 32 51.0 3 2.3438207 7.934235 FALSE
#> 4 30 39.0 4 13.4681223 7.934235 TRUE 4
#> 5 26 52.0 5 1.2832746 7.934235 FALSE
#> 6 25 55.0 6 4.7211239 7.934235 FALSE
ggplot(d2, aes(x = x, y = y)) +
geom_point() +
geom_text(aes(label = label), nudge_x = 1, color = "blue")
由reprex package(v0.2.0)于2018-07-31创建。