sparkR Rstudio错误

时间:2016-07-20 05:29:39

标签: macos rstudio sparkr

Rstudio中的sparkR无法读取错误的数据。

如何解决我能做什么

环境

R:version 3.3.1
RStudio:Version 0.99.902 
sparkR:Version 1.6.1
mac:Version 10.11.6

SPARK_HOME <- "/usr/local/Cellar/apache-spark/1.6.1/libexec"
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.4.0" "sparkr-shell"')
.libPaths(c(file.path(SPARK_HOME, "R", "lib"), .libPaths()))
library(SparkR)

sc <- sparkR.init(master="local[3]", sparkHome=SPARK_HOME,
               sparkEnvir=list(spark.driver.maemory="6g",
                               sparkPackages="com.databricks:spark-csv_2.10:1.4.0"))

sqlContext <- sparkRSQL.init(sc)

WARN

WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.

df <- read.df(sqlContext, "iris.csv", source="com.databricks.spark.csv", inferSchema="true")

WARN

WARN : Your hostname, xxxx-no-MacBook-Pro.local resolves to a loopback/non-reachable address: fe80:0:0:0:701f:d8ff:fe34:fd1%8, but we couldn't find any external IP address!

错误

ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)

WARN

16/07/20 14:00:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)

错误

16/07/20 14:00:44 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
16/07/20 14:00:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/07/20 14:00:44 INFO TaskSchedulerImpl: Cancelling stage 0
16/07/20 14:00:44 INFO DAGScheduler: ResultStage 0 (first at CsvRelation.scala:267) failed in 60.099 s
16/07/20 14:00:44 INFO DAGScheduler: Job 0 failed: first at CsvRelation.scala:267, took 60.168711 s
16/07/20 14:00:44 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed
invokeJava(isStatic = TRUE, className, methodName, ...) でエラー: 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)

因为不了解解决方法。 请告诉我。

1 个答案:

答案 0 :(得分:0)

试试这个

Sys.setenv(SPARK_HOME="/usr/local/Cellar/apache-spark/1.6.1/libexec")
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.4.0" "sparkr-shell"')
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R","lib")))
sc <- sparkR.init(master="local", sparkEnvir = list(spark.driver.memory="4g", spark.executor.memory="6g"))

sqlContext <- sparkRSQL.init(sc)

它对我有用。