sparklyr方法无效 - 无法为签名'" tbl_spark"'找到函数'count'的继承方法

时间:2018-05-08 14:44:28

标签: r apache-spark sparklyr

我试图从RStudio服务器使用sparklyr。 Spark安装在集群中,我使用yarn连接。 Conection工作正常但是当我尝试一个简单的例子作为从文档获取时我得到错误:

"错误(函数(classes,fdef,mtable):   无法为签名'" tbl_spark"'找到函数'count'的继承方法。

这是我的简单代码:

> library(sparklyr)
> library(dplyr)
> sc <- spark_connect(master = "yarn")
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> iris_tbl <- sdf_copy_to(sc = sc, x = iris, overwrite = T)
[1] "iris"
> iris_tbl %>% count
  Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘count’ for signature ‘"tbl_spark"’

在这么简单的例子中,我无法理解错误。 +

复制我的sessionInfo

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C               C_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8     LC_MONETARY=es_ES.UTF-8   
 [6] LC_MESSAGES=es_ES.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sparklyr_0.8.1-9001 devtools_1.13.5     nycflights13_0.2.2      rJava_0.9-9         SparkR_2.1.0        slam_0.1-42        
 [7] stringi_1.1.6       dplyr_0.7.4         servr_0.8           topicmodels_0.2-7   tm_0.7-3            NLP_0.1-11         
[13] LDAvis_0.3.2       

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.4  modeltools_0.2-21 purrr_0.2.4       reshape2_1.4.3    lattice_0.20-35   htmltools_0.3.6   stats4_3.4.3     
 [8] yaml_2.1.19       base64enc_0.1-3   utf8_1.1.3        rlang_0.2.0       pillar_1.2.1      foreign_0.8-69    glue_1.2.0       
[15] withr_2.1.2       DBI_1.0.0         rappdirs_0.3.1    dbplyr_1.2.1      bindrcpp_0.2      bindr_0.1.1       plyr_1.8.4       
[22] stringr_1.2.0     memoise_1.1.0     psych_1.8.4       httpuv_1.3.6.2    parallel_3.4.3    curl_3.2          broom_0.4.4      
[29] Rcpp_0.12.15      readr_1.1.1       xtable_1.8-2      openssl_1.0.1     backports_1.1.2   jsonlite_1.5      config_0.3       
[36] mime_0.5          mnormt_1.5-5      hms_0.4.2         digest_0.6.15     shiny_1.0.5       rprojroot_1.3-2   grid_3.4.3       
[43] cli_1.0.0         tools_3.4.3       magrittr_1.5      lazyeval_0.2.1    tibble_1.4.2      crayon_1.3.4      tidyr_0.8.0      
[50] pkgconfig_2.0.1   xml2_1.2.0        assertthat_0.2.0  httr_1.3.1        rstudioapi_0.7    R6_2.2.2          git2r_0.21.0     
[57] nlme_3.1-131      compiler_3.4.3 

1 个答案:

答案 0 :(得分:0)

您使用了错误的功能。 count是一个需要列的函数:

  总结的

包装器将会是        根据您是否正在计算,调用'n()'或'sum(n)'        第一次,或重新计算。 'count()'类似但是调用        'group_by()'之前和'ungroup()'之后

所以

df %>% count(x)

相同
df %>% group_by(x) %>% summarise(n = n()) %>% ungroup()

使用summarise计算记录:

iris_tbl %>% summarise(n = n()) 

iris_tbl %>% spark_dataframe() %>% invoke("count")