我知道spark 1.6.0可能已经过时了,但是我们在堆栈上有它。尝试使用sparklyr::sdf_quantile()
。
mtc <- copy_to(sc, mtcars, "mtcars")
mtc %>% sdf_quantile("hp")
我收到以下错误(使用spark 1.6.0 via yarn):
Error: java.lang.IllegalArgumentException: invalid method approxQuantile for object 168
at sparklyr.Invoke$.invoke(invoke.scala:122)
at sparklyr.StreamHandler$.handleMethodCall(stream.scala:97)
at sparklyr.StreamHandler$.read(stream.scala:62)
at sparklyr.BackendHandler.channelRead0(handler.scala:52)
at sparklyr.BackendHandler.channelRead0(handler.scala:14)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
这是我这台机器的sessionInfo()。
sessionInfo()
Oracle Distribution of R version 3.3.0 (--)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Oracle Linux Server 7.2
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] kudusparklyr_0.1.0 sparklyr_0.7.0 dbplot_0.2.0 rlang_0.1.4
[5] bindrcpp_0.2 anytime_0.3.0 jsonlite_1.5 magrittr_1.5
[9] ggplot2_2.2.1 DBI_0.7 dtplyr_0.0.2 dplyr_0.7.4
[13] data.table_1.10.4-3 devtools_1.13.4 httr_1.3.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 dbplyr_1.1.0 plyr_1.8.4 bindr_0.1
[5] base64enc_0.1-3 tools_3.3.0 digest_0.6.12 lattice_0.20-33
[9] nlme_3.1-127 memoise_1.1.0 tibble_1.3.4 gtable_0.2.0
[13] pkgconfig_2.0.1 psych_1.7.8 shiny_1.0.5 rstudioapi_0.7
[17] yaml_2.1.15 parallel_3.3.0 stringr_1.2.0 withr_2.1.0
[21] rprojroot_1.2 grid_3.3.0 glue_1.2.0 R6_2.2.2
[25] foreign_0.8-66 reshape2_1.4.2 purrr_0.2.4 tidyr_0.7.2
[29] scales_0.5.0 backports_1.1.1 htmltools_0.3.6 mnormt_1.5-5
[33] assertthat_0.2.0 xtable_1.8-2 mime_0.5 RApiDatetime_0.0.3
[37] colorspace_1.3-2 httpuv_1.3.5 labeling_0.3 config_0.2
[41] stringi_1.1.6 openssl_0.9.9 lazyeval_0.2.1 munsell_0.4.3
[45] broom_0.4.3
在另一台机器上(本地有spark 2.2.0),它正在运行:
mtc %>% sdf_quantile("hp")
0% 25% 50% 75% 100%
52 95 123 180 335
使用以下sessionInfo:
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252
[3] LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=German_Austria.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rsparkling_0.2.2 leaflet_1.1.0 dplyr_0.7.4 purrr_0.2.4
[5] readr_1.1.1 tidyr_0.6.1 tibble_1.4.1 ggplot2_2.2.1
[9] tidyverse_1.1.1 sparklyr_0.7.0-9030
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 lubridate_1.6.0 lattice_0.20-35 assertthat_0.2.0 rprojroot_1.2
[6] digest_0.6.12 psych_1.7.3.21 mime_0.5 R6_2.2.2 cellranger_1.1.0
[11] plyr_1.8.4 backports_1.0.5 evaluate_0.10 httr_1.2.1 pillar_1.0.1
[16] rlang_0.1.6 lazyeval_0.2.0 readxl_1.0.0 rstudioapi_0.7 rmarkdown_1.6
[21] config_0.2 stringr_1.2.0 foreign_0.8-69 htmlwidgets_0.8 RCurl_1.95-4.8
[26] munsell_0.4.3 shiny_1.0.5 broom_0.4.2 compiler_3.4.1 httpuv_1.3.5
[31] modelr_0.1.0 pkgconfig_2.0.1 base64enc_0.1-3 mnormt_1.5-5 htmltools_0.3.5
[36] openssl_0.9.7 withr_2.0.0 dbplyr_1.2.0 rappdirs_0.3.1 bitops_1.0-6
[41] grid_3.4.1 nlme_3.1-131 jsonlite_1.5 xtable_1.8-2 gtable_0.2.0
[46] DBI_0.7 magrittr_1.5 scales_0.4.1 stringi_1.1.3 reshape2_1.4.2
[51] bindrcpp_0.2 xml2_1.1.1 tools_3.4.1 forcats_0.2.0 glue_1.2.0
[56] hms_0.3 crosstalk_1.0.0 parallel_3.4.1 yaml_2.1.14 colorspace_1.3-2
[61] h2o_3.14.0.2 rvest_0.3.2 knitr_1.15.1 bindr_0.1 haven_1.0.0
任何想法出了什么问题?
答案 0 :(得分:2)
approxQuantile
- SPARK-6761。您必须更新Apache Spark安装才能使用它。
如果您启用了Hive支持,则可以尝试percentile_approx
Hive功能:
df <- copy_to(sc, iris)
sc %>% spark_session() %>%
invoke("sql", "SELECT percentile_approx(Sepal_Length, 0.5) FROM iris") %>%
sdf_register("median")
# # Source: table<median> [?? x 1]
# # Database: spark_connection
# `_c0`
# <dbl>
# 1 5.73