我已经在GCP中创建了一个dataproc集群,我希望能够从GCS中读取一些数据(存储在配置单元中)。我可以使用sparkR
来做到这一点-与R的火花壳等效。
但是,我希望能够通过说spark-submit test.R
如何将sparkR
库加载到我的R session
这是我的脚本-
.libPaths( c( .libPaths(), "/usr/lib/spark") )
print("Library paths ... ")
.libPaths()
print("session info...")
sessionInfo()
print("loading Spark R")
library(sparkR)
#sparkR.session()
print("done")
这是输出日志-
[1] "Library paths ... "
[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"
[3] "/usr/lib/R/library"
[1] "session info..."
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
[1] "loading Spark R"
Error in library(sparkR) : there is no package called ‘sparkR’
Execution halted
成功运行后,我可以继续使用-
加载数据df <- sql("select fields from DB.table limit 10")
createOrReplaceTempView(df, "df")
总而言之,我想在SparkR
中引入R session
功能。