无法从Spark master启动Worker:退出代码1 exitStatus 1

时间:2017-08-19 13:54:40

标签: scala apache-spark hive

我遇到了标题中提到的问题,我真的不知道如何修复它。我尝试了很多相关的答案提供了解决方案,论坛等等,但我无法让它沉默。

我有一台运行独立Spark Master的EC2 Ubuntu 16机器(RAM~32GB,ROM~70GB,8核心)。下面我展示我的整体配置。

spark-env.sh

. . .
SPARK_PUBLIC_DNS=xx.xxx.xxx.xxx
SPARK_MASTER_PORT=7077
. . .

的/ etc /主机

127.0.0.1 locahost localhost.domain ubuntu
::1 locahost localhost.domain ubuntu
localhost  master # master and slave have same ip
localhost  slave  # master and slave have same ip

我尝试通过Intellij Idea使用以下Scala代码连接到它:

new SparkConf()
    .setAppName("my-app")
    .setMaster("spark://xx.xxx.xxx.xxx:7077")
    .set("spark.executor.host", "xx.xxx.xxx.xxx")
    .set("spark.executor.cores", "8")
    .set("spark.executor.memory","20g")

此配置会生成以下日志。 master.log 包含许多行,例如:

. . .
xx/xx/xx xx:xx:xx INFO Master: Removing executor app-xxxxxxxxxxxxxx-xxxx/xx because it is EXITED
xx/xx/xx xx:xx:xx INFO Master: Launching executor app-xxxxxxxxxxxxxx-xxxx/xx on worker worker-xxxxxxxxxxxxxx-127.0.0.1-42524

worker.log 包含许多行,例如:

. . .
xx/xx/xx xx:xx:xx INFO Worker: Executor app-xxxxxxxxxxxxxx-xxxx/xxx finished with state EXITED message Command exited with code 1 exitStatus 1
xx/xx/xx xx:xx:xx INFO Worker: Asked to launch executor app-xxxxxxxxxxxxxx-xxxx/xxx for my-app
xx/xx/xx xx:xx:xx INFO SecurityManager: Changing view acls to: ubuntu
xx/xx/xx xx:xx:xx INFO SecurityManager: Changing modify acls to: ubuntu
xx/xx/xx xx:xx:xx INFO SecurityManager: Changing view acls groups to: 
xx/xx/xx xx:xx:xx INFO SecurityManager: Changing modify acls groups to: 
xx/xx/xx xx:xx:xx INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ubuntu); groups with view permissions: Set(); users  with modify permissions: Set(ubuntu); groups with modify permissions: Set()
xx/xx/xx xx:xx:xx INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-openjdk-amd64/jre//bin/java" "-cp" "/usr/local/share/spark/spark-2.1.1-bin-hadoop2.7/conf/:/usr/local/share/spark/spark-2.1.1-bin-hadoop2.7/jars/*" "-Xmx4096M" "-Dspark.driver.port=34889" "-Dspark.cassandra.connection.port=9042" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@127.0.0.1:34889" "--executor-id" "476" "--hostname" "127.0.0.1" "--cores" "1" "--app-id" "app-xxxxxxxxxxxxxx-xxxx" "--worker-url" "spark://Worker@127.0.0.1:42524"

如果你愿意,here's a Gist包含我上面的日志行。

如果我尝试以下基本配置,我有0个错误,但我的应用程序只是挂起,服务器真的什么也没做。没有CPU / RAM利用率。

new SparkConf()
              .setAppName("my-app")
              .setMaster("spark://xx.xxx.xxx.xxx:7077")
  

/etc/hosts我将主设备和从设备设置为同一个ip。服务器和2.11.6上的Scala版本build.sbt。服务器和2.1.1上的Spark版本build.sbt

以下是一些Spark-UI屏幕:

enter image description here

enter image description here

enter image description here

enter image description here

所以,我想:

  • 从我的电脑启动任务
  • 在服务器上处理该任务
  • 在我的电脑上获得结果

我猜,这可能是一个糟糕的资源配置吗?如果没有,可能是什么导致了这个?我应该如何调整配置以避免此类问题?

如果您需要更多详细信息,请询问。

1 个答案:

答案 0 :(得分:0)

由于我希望我的个人计算机进行编排,我更改了配置,将其设置为主服务器和服务器作为执行程序。

所以,我的 conf / spark-env.sh 将是:

# Options read by executors and drivers running inside the cluster
SPARK_LOCAL_IP=localhost #o set the IP address Spark binds to on this node
SPARK_PUBLIC_DNS=xx.xxx.xxx.xxx #PUBLIC SERVER IP

<强> CONF /从站

# A Spark Worker will be started on each of the machines listed below.
xx.xxx.xxx.xxx #PUBLIC SERVER IP

<强>的/ etc /主机

xx.xxx.xxx.xxx master #PUBLIC SERVER IP
xx.xxx.xxx.xxx slave  #PUBLIC SERVER IP

最后Scala配置将是:

.setMaster("local[*]")
.set("spark.executor.host", "xx.xxx.xxx.xxx") //Public Server IP
.set("spark.executor.memory","16g")