Spark-1.5.0 - 在RStudio中加载com.databricks:-csv_2.11:1.2.0

时间:2015-10-01 14:25:53

标签: r apache-spark rstudio

在我的Mac计算机上安装Spark-1.5.0后,我尝试使用rStudio中的com.databricks:-csv_2.11:1.2.0软件包初始化spark上下文,如下所示:

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:-csv_2.11:1.2.0" "sparkr-shell"')
library(SparkR, lib.loc = "spark-1.5.0-bin-hadoop2.6/R/lib/")
sc <- sparkR.init(sparkHome = "spark-1.5.0-bin-hadoop2.6/")

但是我收到以下错误消息:

[unresolved dependency: com.springml#spark-salesforce_2.10;1.0.1: not found]

为什么会这样?

P.s。,当我使用com.databricks:spark-csv_2.10:1.0.3时,启动工作正常。

  

更新

我尝试使用版本com.databricks:spark-csv_2.10:1.2.0并且工作正常。

现在,我在rStudio中使用此代码加载csv文件:

sqlContext <- sparkRSQL.init(sc)
flights <- read.df(sqlContext, "R/nycflights13.csv", "com.databricks.spark.csv", header="true")

我收到以下错误消息:

Error in writeJobj(con, object) : invalid jobj 1

当我执行sqlContext时,我收到错误:

Error in callJMethod(x, "getClass") : 
  Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.

会话信息:

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SparkR_1.5.0 rJava_0.9-7 

loaded via a namespace (and not attached):
[1] tools_3.2.0

请注意,当我使用具有相同命令的Spark Shell时,我不会收到此错误。

1 个答案:

答案 0 :(得分:1)

问题解决了。

重新启动Rsession并使用以下代码后,现在一切正常:

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"')
library(rJava)
library(SparkR, lib.loc = "spark-1.5.0-bin-hadoop2.6/R/lib/")

sc <- sparkR.init(master = "local", sparkHome = "spark-1.5.0-bin-hadoop2.6")

sqlContext <- sparkRSQL.init(sc)

flights <- read.df(sqlContext, "R/nycflights13.csv", "com.databricks.spark.csv", header="true")