可视化主要组件(PC)

时间:2019-04-07 09:49:01

标签: r ggplot2 pca

我正在R中使用ggplot2来从代码中生成绘图:

final_flights <-增强(flights_model,flights_tbl)%>%collect()' final_flights

# A tibble: 327,346 x 22
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1     1      517            515         2      830            819
 2  2013     1     1      533            529         4      850            830
 3  2013     1     1      542            540         2      923            850
 4  2013     1     1      544            545        -1     1004           1022
 5  2013     1     1      554            600        -6      812            837
 6  2013     1     1      554            558        -4      740            728
 7  2013     1     1      555            600        -5      913            854
 8  2013     1     1      557            600        -3      709            723
 9  2013     1     1      557            600        -3      838            846
10  2013     1     1      558            600        -2      753            745
# ... with 327,336 more rows, and 14 more variables: arr_delay <dbl>,
#   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
#   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>,
#   PC1 <dbl>, PC2 <dbl>, PC3 <dbl>

我已经尝试过了:

    ggplot(final_flights, aes(PC1, PC2)) +   
      geom_point(aes(colour=air_time))


    ggplot(final_flights, aes(PC1, PC2, PC3))+ 
      geom_point(aes(colour=air_time, distance, dep_time))

    ml_predict(kmeans_model) %>%
      collect() %>%
      ggplot(aes(air_time, distance, dep_time)) +
      geom_point(aes(air_time, distance, dep_time, col = factor(prediction+1)),
     size=2, alpha=0.5)+
     geom_point(data=kmeans_model$k, aes(air_time, distance, dep_time),
      pch='x', size=12)+
      scale_color_discrete(name="Predicted cluster")

> Warning: Ignoring unknown aesthetics:  Warning: Ignoring unknown
> aesthetics:  Error: Column 3 must be named.

我想生成具有两个主要成分的ggplot模型,该变量解释了数据中的聚类

0 个答案:

没有答案