ggplot2 :: geom_text()的怪异行为

时间:2018-08-01 00:49:26

标签: r ggplot2

我希望ggplot()标记残差高于回归标准误差1.5倍的观察值。数据是这些数据(来自Frank 1984):

d <- data.frame(x=c(43,32,32,30,26,25,23,22,22,21,20,20,19,19,19,18,18,17,17,16,16,16,15,13,12,12,10,10,9,7,6,3), y=c(63.0,54.3,51.0,39.0,52.0,55.0,41.2,47.7,44.5,43.0,46.8,42.4,56.5,55.0,53.0,55.0,45.0,50.7,37.5,61.0,48.1,30.0,51.5,40.6,51.3,50.3,62.4,39.3,43.2,40.4,37.7,27.7))

模型很简单:

m <- lm(y~x,data=d)

然后ggplot()是:

ggplot(d, aes(x=x, y=y)) + geom_point() + geom_text(label=ifelse(abs(resid(m))>(1.5*sigma(m)),rownames(d),""), 
        nudge_x = 1, nudge_y = 0, check_overlap = T, color="blue")

给予这个情节

enter image description here

在左上角缺少观察标签(obs#27)。比较:

abs(resid(m))>(1.5*sigma(m))
    1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29    30    31    32 
FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE

正确表示27满足条件。为什么没有标签?

1 个答案:

答案 0 :(得分:1)

geom_text中的标签不在aes内,尽管我不确定为什么您仍然在没有标签的情况下仍能部分使用标签。

我包括一些中间步骤,以便更慢地完成此操作;对我来说,这有助于调试和调查工作原理。随时冷凝。

分配dm与OP相同。通过额外的步骤:

library(tidyverse)

d2 <- d %>%
  mutate(row = row_number()) %>%
  mutate(abs_resid = abs(resid(m)), sig = sigma(m)) %>%
  mutate(is_outlier = abs_resid > 1.5 * sig) %>%
  mutate(label = ifelse(is_outlier, row, ""))

head(d2)
#>    x    y row  abs_resid      sig is_outlier label
#> 1 43 63.0   1  4.8398378 7.934235      FALSE      
#> 2 32 54.3   2  0.9561793 7.934235      FALSE      
#> 3 32 51.0   3  2.3438207 7.934235      FALSE      
#> 4 30 39.0   4 13.4681223 7.934235       TRUE     4
#> 5 26 52.0   5  1.2832746 7.934235      FALSE      
#> 6 25 55.0   6  4.7211239 7.934235      FALSE

ggplot(d2, aes(x = x, y = y)) +
  geom_point() +
  geom_text(aes(label = label), nudge_x = 1, color = "blue")

reprex package(v0.2.0)于2018-07-31创建。