当我尝试使用 ggplot 创建nycflights13
关系图时,会显示以下错误,有人可以帮我弄明白吗?
library(ggplot2)
library(nycflights13)
data=flights
mutate(data,
delay=arr_delay - dep_delay)
p1 <- ggplot(data,aes(x=dist,y=delay))+
geom_point(aes(color=count,size=count),alpha=1/2)+
xlab("Distance")+
ylab("delay")+
ggtitle("Distance vs. Delay")+
geom_smooth()+
scale_size_area()
p1
错误:
Don't know how to automatically pick scale for object of type function.
Defaulting to continuous.
Don't know how to automatically pick scale for the object of type function.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type tbl_df/tbl/data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (336776): colour, size, x, y
答案 0 :(得分:2)
变量名有很多问题。 x
美学应该映射到distance
,而y
必须是dep_delay
或arr_delay
。 ggplot2
不对列名进行部分匹配。此外,@ user20650指出没有count
列。
此通话有效:
ggplot(flights,aes(x=distance,y=arr_delay))+
geom_point(aes(),alpha=1/2)+
xlab("Distance")+
ggtitle("Distance vs. Delay")+
geom_smooth()+
scale_size_area()
答案 1 :(得分:2)
您的代码确实存在许多问题。除distance
的错误缩写外,计算出的delay
值不会保存在任何位置。 mutate()
返回一个必须分配给变量或管道的tibble。
修复这些问题但忽略了count
一秒钟,下面的代码
library(ggplot2)
library(nycflights13)
library(dplyr)
p1 <- flights %>%
mutate(delay = arr_delay - dep_delay) %>% {
ggplot(.) + aes(x = distance, y = delay) +
geom_point(alpha = 1 / 2) +
xlab("Distance") +
ylab("Delay") +
ggtitle("Distance vs. Delay") +
geom_smooth()
}
p1
产生
count
count
是原始数据集中包含的 no 变量(列)。在his comment中,OP试图澄清&#34; 我将color = count,size = count的原因是我想要颜色和尺寸随着计数的变化&#34 ;
在一个疯狂的猜测中,我认为OP确实意味着每个距离的航班数量,但他仍然希望将每个单独的航班用颜色和大小作为附加属性。
因此,必须计算每个距离的航班数量,并将其添加到每个受影响的航班。这种计算必须在调用ggplot()
之前完成。在这里,我从dplyr
切换到data.table
,因为我对后者更熟悉:
library(data.table)
p1 <- data.table(flights)[, delay := arr_delay - dep_delay][
, count := .N, by = distance][
, ggplot(.SD) + aes(x = distance, y = delay, color = count, size = count) +
geom_point(alpha = 1 / 2) +
xlab("Distance") +
ylab("Delay") +
ggtitle("Distance vs. Delay") +
geom_smooth() +
scale_size_area()]
p1
distance
图表看起来很混乱,因此我建议在绘制之前汇总数据点
p1 <- data.table(flights)[, delay := arr_delay - dep_delay][
, .(count = .N, delay = median(delay, na.rm = TRUE)), by = distance][
, ggplot(.SD) +
aes(x = distance, y = delay, color = count, size = count) +
geom_point(alpha = 1 / 2) +
xlab("Distance") +
ylab("Delay") +
ggtitle("Distance vs. Delay") +
geom_smooth()]
p1
产生
请注意,median()
代替mean()
用于汇总延迟时间,以减少异常值的影响。
数据看起来仍然很嘈杂,因此考虑将距离加以考虑可能是值得的:
p1 <- data.table(flights)[, delay := arr_delay - dep_delay][
, .(count = .N, delay = median(delay, na.rm = TRUE)),
by = .(distance = round(distance, -1L))][
, ggplot(.SD) +
aes(x = distance, y = delay, color = count, size = count) +
geom_point(alpha = 1 / 2) +
xlab("Distance, rounded to 10 miles") +
ylab("Median Delay") +
ggtitle("Distance vs. Delay") +
geom_smooth()]
p1