如何在R中使用百分位数绘制两种类型的样本的箱形图

时间:2019-04-29 21:01:17

标签: r boxplot

我有一个看起来像这样的数据:

df <-data.frame(
  Group = c("1", "2", "3", "4"), 
  GOOD_0 = c(1L, 1L, 1L, 1L), 
  GOOD_25 = c(61.25, 1, 1, 1), 
  GOOD_50 = c(119, 1, 1, 1), 
  GOOD_75 = c(153, 1, 1, 1), 
  GOOD_100 = c(237L, 1L, 1L, 1L), 
  SALINE_0 = c(1L, 1L, 1L, 1L), 
  SALINE_25 = c(1, 40.25, 1, 22.5), 
  SALINE_50 = c(1, 86, 52.5, 122.5), 
  SALINE_75 = c(1, 136, 101.5, 269.25), 
  SALINE_100 = c(60L, 360L, 222L, 508L)
)

我想一个接一个地绘制GOOD和SALINE类型的箱形图(也许用两种不同的颜色)。 GOOD_和SALINE_之后的数字表示其百分位数。如何使用R中的这些百分位数为Groups绘制箱形图?

我可以使用类似的GOOD类型,但是不能在同一图中包含SALINE框

ggplot(df, aes(x=Group, ymin = GOOD_0, lower = GOOD_25, middle = GOOD_50, upper = GOOD_75, ymax = GOOD_100)) +
      geom_boxplot(stat = "identity")

1 个答案:

答案 0 :(得分:1)

如果稍微转换数据,则可以轻松完成此操作。处理ggplot的最佳方法是使数据采用长格式。因此,请重新调整dataframe的外观,然后添加一列以标识它属于哪个组SALINEGOOD

我假设您的x变量为Group,因为x不会像您对aes(x=x ...)所做的那样存在于数据中

GOOD <- df %>% select(Group, starts_with("GOOD")) %>% rename(Percentile_0 = GOOD_0, 
                                                     Percentile_25 = GOOD_25, 
                                                     Percentile_50 = GOOD_50, 
                                                     Percentile_75 = GOOD_75, 
                                                     Percentile_100 = GOOD_100) 
SALINE <- df %>% select(Group, starts_with("SALINE")) %>% rename(Percentile_0 = SALINE_0, 
                                                       Percentile_25 = SALINE_25, 
                                                       Percentile_50 = SALINE_50, 
                                                       Percentile_75 = SALINE_75, 
                                                       Percentile_100 = SALINE_100) 


new_df <- bind_rows(GOOD %>% mutate(grp = "GOOD"), SALINE %>% mutate(grp = "SALINE"))

new_df
# A tibble: 8 x 7
  Group Percentile_0 Percentile_25 Percentile_50 Percentile_75 Percentile_100 grp   
  <fct>        <int>         <dbl>         <dbl>         <dbl>          <int> <chr> 
1 1                1          61.2         119            153             237 GOOD  
2 2                1           1             1              1               1 GOOD  
3 3                1           1             1              1               1 GOOD  
4 4                1           1             1              1               1 GOOD  
5 1                1           1             1              1              60 SALINE
6 2                1          40.2          86            136             360 SALINE
7 3                1           1            52.5          102.            222 SALINE
8 4                1          22.5         122.           269.            508 SALINE

现在,有几种方法可以完成上述操作。但是一旦完成,绘制两者就非常简单,如果您指定了ggplot的美感,colour将为您创建一个图例。因此,

new_df %>% ggplot(aes(x = Group, group = grp, colour = grp)) +
           geom_boxplot(stat = "identity", 
                        aes(ymin = Percentile_0, lower = Percentile_25, middle = Percentile_50, upper = Percentile_75, ymax = Percentile_100))

最终数据框

structure(list(Group = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L), .Label = c("1", "2", "3", "4"), class = "factor"), Percentile_0 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), Percentile_25 = c(61.25, 1, 1, 1, 
1, 40.25, 1, 22.5), Percentile_50 = c(119, 1, 1, 1, 1, 86, 52.5, 
122.5), Percentile_75 = c(153, 1, 1, 1, 1, 136, 101.5, 269.25
), Percentile_100 = c(237L, 1L, 1L, 1L, 60L, 360L, 222L, 508L
), grp = c("GOOD", "GOOD", "GOOD", "GOOD", "SALINE", "SALINE", 
"SALINE", "SALINE")), row.names = c(NA, -8L), class = "data.frame")