图形错误:不知道如何自动选择比例

时间:2017-06-29 20:09:09

标签: r ggplot2

当我尝试使用 ggplot 创建nycflights13关系图时,会显示以下错误,有人可以帮我弄明白吗?

library(ggplot2)
library(nycflights13)

data=flights

mutate(data,
   delay=arr_delay - dep_delay)

p1 <- ggplot(data,aes(x=dist,y=delay))+
      geom_point(aes(color=count,size=count),alpha=1/2)+
      xlab("Distance")+
      ylab("delay")+
      ggtitle("Distance vs. Delay")+
      geom_smooth()+
      scale_size_area()

p1

错误:

Don't know how to automatically pick scale for object of type function. 
Defaulting to continuous.

Don't know how to automatically pick scale for the object of type function. 
Defaulting to continuous.

Don't know how to automatically pick scale for object of type tbl_df/tbl/data.frame. Defaulting to continuous.

Error: Aesthetics must be either length 1 or the same as the data (336776): colour, size, x, y

2 个答案:

答案 0 :(得分:2)

变量名有很多问题。 x美学应该映射到distance,而y必须是dep_delayarr_delayggplot2不对列名进行部分匹配。此外,@ user20650指出没有count列。

此通话有效:

ggplot(flights,aes(x=distance,y=arr_delay))+
  geom_point(aes(),alpha=1/2)+
  xlab("Distance")+ 
  ggtitle("Distance vs. Delay")+
  geom_smooth()+
  scale_size_area()

答案 1 :(得分:2)

您的代码确实存在许多问题。除distance的错误缩写外,计算出的delay值不会保存在任何位置。 mutate()返回一个必须分配给变量或管道的tibble。

修复这些问题但忽略了count一秒钟,下面的代码

library(ggplot2)
library(nycflights13)
library(dplyr)
p1 <- flights %>% 
  mutate(delay = arr_delay - dep_delay) %>% {
    ggplot(.) + aes(x = distance, y = delay) +
      geom_point(alpha = 1 / 2) +
      xlab("Distance") +
      ylab("Delay") +
      ggtitle("Distance vs. Delay") +
      geom_smooth()
  }
p1

产生

enter image description here

添加count

count是原始数据集中包含的 no 变量(列)。在his comment中,OP试图澄清&#34; 我将color = count,size = count的原因是我想要颜色和尺寸随着计数的变化&#34 ;

在一个疯狂的猜测中,我认为OP确实意味着每个距离的航班数量,但他仍然希望将每个单独的航班用颜色和大小作为附加属性。

因此,必须计算每个距离的航班数量,并将其添加到每个受影响的航班。这种计算必须在调用ggplot()之前完成。在这里,我从dplyr切换到data.table,因为我对后者更熟悉:

library(data.table)
p1 <- data.table(flights)[, delay := arr_delay - dep_delay][
  , count := .N, by = distance][
    , ggplot(.SD) + aes(x = distance, y = delay, color = count, size = count) +
      geom_point(alpha = 1 / 2) +
      xlab("Distance") +
      ylab("Delay") +
      ggtitle("Distance vs. Delay") +
      geom_smooth() +
      scale_size_area()]
p1

enter image description here

distance

汇总

图表看起来很混乱,因此我建议在绘制之前汇总数据点

p1 <- data.table(flights)[, delay := arr_delay - dep_delay][
  , .(count = .N, delay = median(delay, na.rm = TRUE)), by = distance][
    , ggplot(.SD) + 
      aes(x = distance, y = delay, color = count, size = count) +
      geom_point(alpha = 1 / 2) +
      xlab("Distance") +
      ylab("Delay") +
      ggtitle("Distance vs. Delay") +
      geom_smooth()]
p1

产生

enter image description here

请注意,median()代替mean()用于汇总延迟时间,以减少异常值的影响。

数据看起来仍然很嘈杂,因此考虑将距离加以考虑可能是值得的:

p1 <- data.table(flights)[, delay := arr_delay - dep_delay][
  , .(count = .N, delay = median(delay, na.rm = TRUE)), 
  by = .(distance = round(distance, -1L))][
    , ggplot(.SD) + 
      aes(x = distance, y = delay, color = count, size = count) +
      geom_point(alpha = 1 / 2) +
      xlab("Distance, rounded to 10 miles") +
      ylab("Median Delay") +
      ggtitle("Distance vs. Delay") +
      geom_smooth()]
p1

enter image description here