ggplot2绘制稀有子组的原始数据和普通子组的箱线图

时间:2018-08-23 04:47:40

标签: r ggplot2 boxplot

箱形图可以方便地汇总连续数据,但是,稀有子组(n <10)的箱形图并不总是有用。我想知道是否有可能用罕见的分组箱形图中的原始数据点替换箱形图?

示例:

library(ggplot2)
p <- ggplot(mpg, aes(class, hwy))
p + geom_boxplot()

按类别(汽车类型)生成高速公路(连续)的箱形图。但是,从每个班次的频率来看,我们发现只有5个2座和11个微型厢式货车。我希望查看原始数据(点数,可能会抖动),而不是2个座位和小型货车的箱形图,但保留符合人工设置的最小样本量(例如n = 20)的其他组的箱形图。 / p>

table(mpg$class)

   2seater    compact    midsize    minivan     pickup subcompact        suv       
         5         47         41         11         33         35         62  

那有可能吗?

干杯, 卢克

2 个答案:

答案 0 :(得分:1)

这是完成此操作的方法。您可以将值从20更改为任意值。

# loading the needed libraries
library(tidyverse)

# adding a new column containing count information
(mpg <- mpg %>%
    dplyr::group_by(.data = ., class) %>%
    dplyr::mutate(.data = ., n = dplyr::n()))
#> # A tibble: 234 x 12
#> # Groups:   class [7]
#>    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
#>    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#>  1 audi         a4      1.8  1999     4 auto~ f        18    29 p     comp~
#>  2 audi         a4      1.8  1999     4 manu~ f        21    29 p     comp~
#>  3 audi         a4      2    2008     4 manu~ f        20    31 p     comp~
#>  4 audi         a4      2    2008     4 auto~ f        21    30 p     comp~
#>  5 audi         a4      2.8  1999     6 auto~ f        16    26 p     comp~
#>  6 audi         a4      2.8  1999     6 manu~ f        18    26 p     comp~
#>  7 audi         a4      3.1  2008     6 auto~ f        18    27 p     comp~
#>  8 audi         a4 q~   1.8  1999     4 manu~ 4        18    26 p     comp~
#>  9 audi         a4 q~   1.8  1999     4 auto~ 4        16    25 p     comp~
#> 10 audi         a4 q~   2    2008     4 manu~ 4        20    28 p     comp~
#> # ... with 224 more rows, and 1 more variable: n <int>

# plot
ggplot(data = mpg, mapping = aes(x = class, y = hwy, color = class)) +
  # plotting jittered points
  geom_jitter(size = 3, alpha = 0.5, width = 0.15) +
  # adding boxplots only for class with more than a certain count value
  geom_boxplot(data = dplyr::filter(.data = mpg, n > 20), alpha = 0.5)

reprex package(v0.2.0.9000)创建于2018-08-23。

答案 1 :(得分:1)

此解决方案仅针对较小的数据集大小(按要求)绘制点,仅针对较大的类绘制箱形图:

library(ggplot2)
library(dplyr)

min_n <- 20

mpg %>% 
  group_by(class) %>% 
  mutate(class_count = n()) %>% 
  ggplot(mapping = aes(class, hwy, color = class)) +
  geom_jitter(data = . %>% filter(class_count < min_n)) +
  geom_boxplot(data = . %>% filter(class_count >= min_n))

enter image description here

您可能还想看看的是geom_violin,它增加了有关数据分布的更多信息,并且我发现比箱线图更有用(并且您可以同时使用:)):

mpg %>% 
  group_by(class) %>% 
  mutate(class_count = n()) %>% 
  ggplot(mapping = aes(class, hwy, color = class)) +
  geom_jitter(data = . %>% filter(class_count < min_n)) +
  geom_violin(data = . %>% filter(class_count >= min_n), scale = "count") +
  geom_boxplot(data = . %>% filter(class_count >= min_n), width = 0.1)

enter image description here