我正在尝试将中位数计算为一个数字,然后将该数字用作ggplot的美学价值。
我首先尝试获取中位数作为值:
mean_delay_median <- nycflights13::flights %>%
group_by(dest) %>%
summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>%
median(mean_delay)
这将产生错误消息:
Error in median.default(., mean_delay) : need numeric data
我该如何解决?
一旦我开始执行此操作,我的第二步就是根据该中间值“ mean_delay_median”之上和之下的值为地图着色,例如:
nycflights13::flights %>%
group_by(dest) %>%
summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>%
inner_join(nycflights13::airports, c('dest' = 'faa')) %>%
ggplot(aes(lon, lat, color=mean_delay>mean_delay_median)) +
borders("state") +
geom_point() +
coord_quickmap()
通常,我会在后续代码中寻求有关使用先前统计信息的指导。
谢谢!
答案 0 :(得分:1)
您只是想念summarise(median_all_delay = median(mean_delay, na.rm = TRUE))
尝试一下:
mean_delay_median <- nycflights13::flights %>%
group_by(dest) %>%
summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>%
summarise(median_all_delay = median(mean_delay, na.rm = TRUE)) %>%
unlist()
nycflights13::flights %>%
group_by(dest) %>%
summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>%
inner_join(nycflights13::airports, c('dest' = 'faa')) %>%
ggplot(aes(lon, lat, color=mean_delay>mean_delay_median)) +
borders("state") +
geom_point() +
coord_quickmap()
输出为:
答案 1 :(得分:1)
您应注意,一个目的地(dest
)缺少每arr_delay
个观测值。
library(tidyverse)
library(nycflights13)
flights %>%
group_by(dest) %>%
filter(all(is.na(arr_delay))) %>%
select(dest, arr_delay)
#> # A tibble: 1 x 2
#> # Groups: dest [1]
#> dest arr_delay
#> <chr> <dbl>
#> 1 LGA NA
这导致NaN
,而不是零。
mean(c(NA), na.rm = TRUE)
#> [1] NaN
换句话说,您应该在na.rm = TRUE
函数中再次添加median
。
flights %>%
group_by(dest) %>%
summarise(mean_delay = mean(arr_delay, na.rm = TRUE)) %>%
mutate(arrival = ifelse(mean_delay > median(mean_delay, na.rm = TRUE), "late", "okay")) %>% # na.rm option to median
inner_join(airports, by = c("dest" = "faa")) %>%
ggplot() +
aes(lon, lat, colour = arrival) +
borders("state") +
geom_point() +
coord_quickmap()
由于LGA
的平均值没有任何值,因此其标签可能变为NA
。