我正在R中使用ggplot2
来从代码中生成绘图:
final_flights <-增强(flights_model,flights_tbl)%>%collect()'
final_flights
# A tibble: 327,346 x 22
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 517 515 2 830 819
2 2013 1 1 533 529 4 850 830
3 2013 1 1 542 540 2 923 850
4 2013 1 1 544 545 -1 1004 1022
5 2013 1 1 554 600 -6 812 837
6 2013 1 1 554 558 -4 740 728
7 2013 1 1 555 600 -5 913 854
8 2013 1 1 557 600 -3 709 723
9 2013 1 1 557 600 -3 838 846
10 2013 1 1 558 600 -2 753 745
# ... with 327,336 more rows, and 14 more variables: arr_delay <dbl>,
# carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
# air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>,
# PC1 <dbl>, PC2 <dbl>, PC3 <dbl>
我已经尝试过了:
ggplot(final_flights, aes(PC1, PC2)) +
geom_point(aes(colour=air_time))
ggplot(final_flights, aes(PC1, PC2, PC3))+
geom_point(aes(colour=air_time, distance, dep_time))
ml_predict(kmeans_model) %>%
collect() %>%
ggplot(aes(air_time, distance, dep_time)) +
geom_point(aes(air_time, distance, dep_time, col = factor(prediction+1)),
size=2, alpha=0.5)+
geom_point(data=kmeans_model$k, aes(air_time, distance, dep_time),
pch='x', size=12)+
scale_color_discrete(name="Predicted cluster")
> Warning: Ignoring unknown aesthetics: Warning: Ignoring unknown
> aesthetics: Error: Column 3 must be named.
我想生成具有两个主要成分的ggplot模型,该变量解释了数据中的聚类