我得到了一个pca模型
> library(sparklyr)
> library(dplyr)
> sc <- spark_connect("local", version="2.0.0")
> iris_tbl <- copy_to(sc, iris, "iris", overwrite = TRUE)
The following columns have been renamed:
- 'Sepal.Length' => 'Sepal_Length' (#1)
- 'Sepal.Width' => 'Sepal_Width' (#2)
- 'Petal.Length' => 'Petal_Length' (#3)
- 'Petal.Width' => 'Petal_Width' (#4)
> pca_model <- tbl(sc, "iris") %>%
+ select(-Species) %>%
+ ml_pca()
> print(pca_model)
Explained variance:
PC1 PC2 PC3 PC4
0.924618723 0.053066483 0.017102610 0.005212184
Rotation:
PC1 PC2 PC3 PC4
Sepal_Length -0.36138659 -0.65658877 0.58202985 0.3154872
Sepal_Width 0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal_Length -0.85667061 0.17337266 -0.07623608 -0.4798390
Petal_Width -0.35828920 0.07548102 -0.54583143 0.7536574
但是不能用结果模型进行预测。
sdf_predict(pca_model)
Source: query [?? x 6]
Database: spark connection master=local[4] app=sparklyr local=TRUE
以错误结束
java.lang.IllegalArgumentException: requirement failed:
The columns of A don't match the number of elements of x. A: 4, x: 0
插入预测数据无济于事
sdf_predict(pca_model, tbl(sc, "iris") %>% select(-Species))
Source: query [?? x 5]
Database: spark connection master=local[4] app=sparklyr local=TRUE
以错误结束
java.lang.IllegalArgumentException: requirement failed:
The columns of A don't match the number of elements of x. A: 4, x: 0
通常可以使用PCA来预测火花吗?
答案 0 :(得分:1)
而不是sdf_predict
,请使用sdf_project
。
> pca_projected <- sdf_project(pca_model, tbl(sc, "iris") %>% select(-Species),
+ features=rownames(pca_model$components))
> pca_projected %>% collect %>% head
# A tibble: 6 x 8
Sepal_Length Sepal_Width Petal_Length Petal_Width PC1 PC2 PC3 PC4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5.10 3.50 1.40 0.200 -2.82 -5.65 0.660 -0.0311
2 4.90 3.00 1.40 0.200 -2.79 -5.15 0.842 0.0657
3 4.70 3.20 1.30 0.200 -2.61 -5.18 0.614 -0.0134
4 4.60 3.10 1.50 0.200 -2.76 -5.01 0.600 -0.109
5 5.00 3.60 1.40 0.200 -2.77 -5.65 0.542 -0.0946
6 5.40 3.90 1.70 0.400 -3.22 -6.07 0.463 -0.0576