我试图从RStudio服务器使用sparklyr。 Spark安装在集群中,我使用yarn连接。 Conection工作正常但是当我尝试一个简单的例子作为从文档获取时我得到错误:
"错误(函数(classes,fdef,mtable): 无法为签名'" tbl_spark"'找到函数'count'的继承方法。
这是我的简单代码:
> library(sparklyr)
> library(dplyr)
> sc <- spark_connect(master = "yarn")
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> iris_tbl <- sdf_copy_to(sc = sc, x = iris, overwrite = T)
[1] "iris"
> iris_tbl %>% count
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘count’ for signature ‘"tbl_spark"’
在这么简单的例子中,我无法理解错误。 +
复制我的sessionInfo
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C C_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8
[6] LC_MESSAGES=es_ES.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sparklyr_0.8.1-9001 devtools_1.13.5 nycflights13_0.2.2 rJava_0.9-9 SparkR_2.1.0 slam_0.1-42
[7] stringi_1.1.6 dplyr_0.7.4 servr_0.8 topicmodels_0.2-7 tm_0.7-3 NLP_0.1-11
[13] LDAvis_0.3.2
loaded via a namespace (and not attached):
[1] tidyselect_0.2.4 modeltools_0.2-21 purrr_0.2.4 reshape2_1.4.3 lattice_0.20-35 htmltools_0.3.6 stats4_3.4.3
[8] yaml_2.1.19 base64enc_0.1-3 utf8_1.1.3 rlang_0.2.0 pillar_1.2.1 foreign_0.8-69 glue_1.2.0
[15] withr_2.1.2 DBI_1.0.0 rappdirs_0.3.1 dbplyr_1.2.1 bindrcpp_0.2 bindr_0.1.1 plyr_1.8.4
[22] stringr_1.2.0 memoise_1.1.0 psych_1.8.4 httpuv_1.3.6.2 parallel_3.4.3 curl_3.2 broom_0.4.4
[29] Rcpp_0.12.15 readr_1.1.1 xtable_1.8-2 openssl_1.0.1 backports_1.1.2 jsonlite_1.5 config_0.3
[36] mime_0.5 mnormt_1.5-5 hms_0.4.2 digest_0.6.15 shiny_1.0.5 rprojroot_1.3-2 grid_3.4.3
[43] cli_1.0.0 tools_3.4.3 magrittr_1.5 lazyeval_0.2.1 tibble_1.4.2 crayon_1.3.4 tidyr_0.8.0
[50] pkgconfig_2.0.1 xml2_1.2.0 assertthat_0.2.0 httr_1.3.1 rstudioapi_0.7 R6_2.2.2 git2r_0.21.0
[57] nlme_3.1-131 compiler_3.4.3
答案 0 :(得分:0)
您使用了错误的功能。 count
是一个需要列的函数:
总结的包装器将会是 根据您是否正在计算,调用'n()'或'sum(n)' 第一次,或重新计算。 'count()'类似但是调用 'group_by()'之前和'ungroup()'之后
所以
df %>% count(x)
与
相同df %>% group_by(x) %>% summarise(n = n()) %>% ungroup()
使用summarise
计算记录:
iris_tbl %>% summarise(n = n())
或
iris_tbl %>% spark_dataframe() %>% invoke("count")