用ggplot标记异常值

时间:2017-12-16 04:59:10

标签: r ggplot2

我试图用ggplot标记异常值。关于我的代码,我有两个问题:

  1. 为什么不在1.5 * IQR以下标记异常值?

  2. 为什么它不根据它们所在的组标记异常值,而是显然指的是数据的整体平均值?我想单独为每个箱形图标注异常值。即第1轮(调查)中的国家A的异常值等

  3. 我的代码示例:

    PERCENT <- rnorm(50, sd = 3)
    WAVE <- sample(6, 50, replace = TRUE)
    AGE_GROUP <- rep(c("21-30", "31-40", "41-50", "51-60", "61-70"), 10)
    COUNTRY <- rep(c("Country A", "Country B"), 25)
    N <- rnorm(50, mean = 200, sd = 2)
    
    df <- data.frame(PERCENT, WAVE, AGE_GROUP, COUNTRY, N)
    
    ggplot(df, aes(x = factor(WAVE), y = PERCENT, fill = factor(COUNTRY))) +
      geom_boxplot(alpha = 0.3) +
      geom_point(aes(color = AGE_GROUP, group = factor(COUNTRY)), position = position_dodge(width=0.75)) +
      geom_text(aes(label = ifelse(PERCENT > 1.5*IQR(PERCENT)|PERCENT < -1.5*IQR(PERCENT), paste(AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),'')), hjust = -.3, size = 3)
    

    到目前为止我的照片: Outlier Label

    enter image description here

    感谢您的帮助!

2 个答案:

答案 0 :(得分:2)

如果您希望按国家/地区计算IQR,则需要对数据进行分组。您可以在全局范围内(即在将数据发送到ggplot之前)或在图层中本地执行此操作。

library(dplyr)
library(ggplot2)

ggplot(df, aes(x = as.factor(WAVE), y = PERCENT, fill = COUNTRY)) +
  geom_boxplot(alpha = 0.3) +
  geom_point(aes(color = AGE_GROUP, group = COUNTRY), position = position_dodge(width=0.75)) +
  geom_text(aes(group = COUNTRY, label = ifelse(!between(PERCENT,-1.3*IQR(PERCENT), 1.3*IQR(PERCENT)), 
                                                paste(" ",COUNTRY, ",", AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),'')), 
            position = position_dodge(width=0.75),
            hjust = "left", size = 3)

答案 1 :(得分:1)

let numberBox: { value: number } = { value: 1 }; function insertString(items: { value: string | number }): void { items.value = 'Test'; } insertString(numberBox); numberBox.value.toExponential(); 美学添加到group并修改geom_text测试应该可以做到你想要的。

设置ifelse会将计算限制在每个箱线图中,并且大纲测试需要包含对group = interaction(WAVE, COUNTRY)的调用。

median(PERCENT)

enter image description here