箱形图可以方便地汇总连续数据,但是,稀有子组(n <10)的箱形图并不总是有用。我想知道是否有可能用罕见的分组箱形图中的原始数据点替换箱形图?
示例:
library(ggplot2)
p <- ggplot(mpg, aes(class, hwy))
p + geom_boxplot()
按类别(汽车类型)生成高速公路(连续)的箱形图。但是,从每个班次的频率来看,我们发现只有5个2座和11个微型厢式货车。我希望查看原始数据(点数,可能会抖动),而不是2个座位和小型货车的箱形图,但保留符合人工设置的最小样本量(例如n = 20)的其他组的箱形图。 / p>
table(mpg$class)
2seater compact midsize minivan pickup subcompact suv
5 47 41 11 33 35 62
那有可能吗?
干杯, 卢克
答案 0 :(得分:1)
这是完成此操作的方法。您可以将值从20更改为任意值。
# loading the needed libraries
library(tidyverse)
# adding a new column containing count information
(mpg <- mpg %>%
dplyr::group_by(.data = ., class) %>%
dplyr::mutate(.data = ., n = dplyr::n()))
#> # A tibble: 234 x 12
#> # Groups: class [7]
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto~ f 18 29 p comp~
#> 2 audi a4 1.8 1999 4 manu~ f 21 29 p comp~
#> 3 audi a4 2 2008 4 manu~ f 20 31 p comp~
#> 4 audi a4 2 2008 4 auto~ f 21 30 p comp~
#> 5 audi a4 2.8 1999 6 auto~ f 16 26 p comp~
#> 6 audi a4 2.8 1999 6 manu~ f 18 26 p comp~
#> 7 audi a4 3.1 2008 6 auto~ f 18 27 p comp~
#> 8 audi a4 q~ 1.8 1999 4 manu~ 4 18 26 p comp~
#> 9 audi a4 q~ 1.8 1999 4 auto~ 4 16 25 p comp~
#> 10 audi a4 q~ 2 2008 4 manu~ 4 20 28 p comp~
#> # ... with 224 more rows, and 1 more variable: n <int>
# plot
ggplot(data = mpg, mapping = aes(x = class, y = hwy, color = class)) +
# plotting jittered points
geom_jitter(size = 3, alpha = 0.5, width = 0.15) +
# adding boxplots only for class with more than a certain count value
geom_boxplot(data = dplyr::filter(.data = mpg, n > 20), alpha = 0.5)
由reprex package(v0.2.0.9000)创建于2018-08-23。
答案 1 :(得分:1)
此解决方案仅针对较小的数据集大小(按要求)绘制点,仅针对较大的类绘制箱形图:
library(ggplot2)
library(dplyr)
min_n <- 20
mpg %>%
group_by(class) %>%
mutate(class_count = n()) %>%
ggplot(mapping = aes(class, hwy, color = class)) +
geom_jitter(data = . %>% filter(class_count < min_n)) +
geom_boxplot(data = . %>% filter(class_count >= min_n))
您可能还想看看的是geom_violin
,它增加了有关数据分布的更多信息,并且我发现比箱线图更有用(并且您可以同时使用:)):
mpg %>%
group_by(class) %>%
mutate(class_count = n()) %>%
ggplot(mapping = aes(class, hwy, color = class)) +
geom_jitter(data = . %>% filter(class_count < min_n)) +
geom_violin(data = . %>% filter(class_count >= min_n), scale = "count") +
geom_boxplot(data = . %>% filter(class_count >= min_n), width = 0.1)