如何在R中的ggplot美学中使用先验中位数?

时间:2018-12-08 06:38:38

标签: r ggplot2 median

我正在尝试将中位数计算为一个数字,然后将该数字用作ggplot的美学价值。

我首先尝试获取中位数作为值:

mean_delay_median <- nycflights13::flights %>% 
  group_by(dest) %>%
  summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>% 
  median(mean_delay)

这将产生错误消息:

Error in median.default(., mean_delay) : need numeric data

我该如何解决?

一旦我开始执行此操作,我的第二步就是根据该中间值“ mean_delay_median”之上和之下的值为地图着色,例如:

nycflights13::flights %>% 
  group_by(dest) %>%
  summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>% 
  inner_join(nycflights13::airports, c('dest' = 'faa')) %>% 
  ggplot(aes(lon, lat, color=mean_delay>mean_delay_median)) +
  borders("state") +
  geom_point() +
  coord_quickmap() 

通常,我会在后续代码中寻求有关使用先前统计信息的指导。

谢谢!

2 个答案:

答案 0 :(得分:1)

您只是想念summarise(median_all_delay = median(mean_delay, na.rm = TRUE))

尝试一下:

mean_delay_median <- nycflights13::flights %>% 
  group_by(dest) %>%
  summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>% 
  summarise(median_all_delay = median(mean_delay, na.rm = TRUE)) %>% 
  unlist()

nycflights13::flights %>% 
  group_by(dest) %>%
  summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>% 
  inner_join(nycflights13::airports, c('dest' = 'faa')) %>% 
  ggplot(aes(lon, lat, color=mean_delay>mean_delay_median)) +
  borders("state") +
  geom_point() +
  coord_quickmap() 

输出为:

enter image description here

答案 1 :(得分:1)

您应注意,一个目的地(dest)缺少每arr_delay个观测值。

library(tidyverse)
library(nycflights13)

flights %>% 
  group_by(dest) %>% 
  filter(all(is.na(arr_delay))) %>% 
  select(dest, arr_delay)
#> # A tibble: 1 x 2
#> # Groups:   dest [1]
#>   dest  arr_delay
#>   <chr>     <dbl>
#> 1 LGA          NA

这导致NaN,而不是零。

mean(c(NA), na.rm = TRUE)
#> [1] NaN

换句话说,您应该在na.rm = TRUE函数中再次添加median

flights %>% 
  group_by(dest) %>% 
  summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>% 
  mutate(arrival = ifelse(mean_delay > median(mean_delay, na.rm = TRUE), "late", "okay")) %>% # na.rm option to median
  inner_join(airports, by = c("dest" = "faa")) %>% 
  ggplot() +
  aes(lon, lat, colour = arrival) +
  borders("state") +
  geom_point() +
  coord_quickmap()

enter image description here

由于LGA的平均值没有任何值,因此其标签可能变为NA