Question

我得到了一个pca模型

> library(sparklyr)
> library(dplyr)
> sc <- spark_connect("local", version="2.0.0")
> iris_tbl <- copy_to(sc, iris, "iris", overwrite = TRUE)
The following columns have been renamed:
- 'Sepal.Length' => 'Sepal_Length' (#1)
- 'Sepal.Width'  => 'Sepal_Width'  (#2)
- 'Petal.Length' => 'Petal_Length' (#3)
- 'Petal.Width'  => 'Petal_Width'  (#4)
> pca_model <- tbl(sc, "iris") %>%
+   select(-Species) %>%
+   ml_pca()
> print(pca_model)
Explained variance:

       PC1         PC2         PC3         PC4 
0.924618723 0.053066483 0.017102610 0.005212184 

Rotation:
                     PC1         PC2         PC3        PC4
Sepal_Length -0.36138659 -0.65658877  0.58202985  0.3154872
Sepal_Width   0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal_Length -0.85667061  0.17337266 -0.07623608 -0.4798390
Petal_Width  -0.35828920  0.07548102 -0.54583143  0.7536574

但是不能用结果模型进行预测。

sdf_predict(pca_model)

Source:   query [?? x 6]
Database: spark connection master=local[4] app=sparklyr local=TRUE

以错误结束

java.lang.IllegalArgumentException: requirement failed: 
The columns of A don't match the number of elements of x. A: 4, x: 0

插入预测数据无济于事

sdf_predict(pca_model, tbl(sc, "iris") %>% select(-Species))

Source:   query [?? x 5]
Database: spark connection master=local[4] app=sparklyr local=TRUE

以错误结束

java.lang.IllegalArgumentException: requirement failed: 
The columns of A don't match the number of elements of x. A: 4, x: 0

通常可以使用PCA来预测火花吗？

Answer 1

而不是sdf_predict，请使用sdf_project。

> pca_projected <- sdf_project(pca_model, tbl(sc, "iris") %>% select(-Species), 
+                              features=rownames(pca_model$components))
> pca_projected %>% collect %>% head
# A tibble: 6 x 8
  Sepal_Length Sepal_Width Petal_Length Petal_Width   PC1   PC2   PC3     PC4
         <dbl>       <dbl>        <dbl>       <dbl> <dbl> <dbl> <dbl>   <dbl>
1         5.10        3.50         1.40       0.200 -2.82 -5.65 0.660 -0.0311
2         4.90        3.00         1.40       0.200 -2.79 -5.15 0.842  0.0657
3         4.70        3.20         1.30       0.200 -2.61 -5.18 0.614 -0.0134
4         4.60        3.10         1.50       0.200 -2.76 -5.01 0.600 -0.109 
5         5.00        3.60         1.40       0.200 -2.77 -5.65 0.542 -0.0946
6         5.40        3.90         1.70       0.400 -3.22 -6.07 0.463 -0.0576

如何使用sdf_predict（）与库中的ml_pca（）提供的模型（sparklyr）

1 个答案: