Question

我遇到了标题中提到的问题，我真的不知道如何修复它。我尝试了很多相关的答案提供了解决方案，论坛等等，但我无法让它沉默。

我有一台运行独立Spark Master的EC2 Ubuntu 16机器（RAM~32GB，ROM~70GB，8核心）。下面我展示我的整体配置。

spark-env.sh ：

. . .
SPARK_PUBLIC_DNS=xx.xxx.xxx.xxx
SPARK_MASTER_PORT=7077
. . .

的/ etc /主机：

127.0.0.1 locahost localhost.domain ubuntu
::1 locahost localhost.domain ubuntu
localhost  master # master and slave have same ip
localhost  slave  # master and slave have same ip

我尝试通过Intellij Idea使用以下Scala代码连接到它：

new SparkConf()
    .setAppName("my-app")
    .setMaster("spark://xx.xxx.xxx.xxx:7077")
    .set("spark.executor.host", "xx.xxx.xxx.xxx")
    .set("spark.executor.cores", "8")
    .set("spark.executor.memory","20g")

此配置会生成以下日志。 master.log 包含许多行，例如：

. . .
xx/xx/xx xx:xx:xx INFO Master: Removing executor app-xxxxxxxxxxxxxx-xxxx/xx because it is EXITED
xx/xx/xx xx:xx:xx INFO Master: Launching executor app-xxxxxxxxxxxxxx-xxxx/xx on worker worker-xxxxxxxxxxxxxx-127.0.0.1-42524

worker.log 包含许多行，例如：

. . .
xx/xx/xx xx:xx:xx INFO Worker: Executor app-xxxxxxxxxxxxxx-xxxx/xxx finished with state EXITED message Command exited with code 1 exitStatus 1
xx/xx/xx xx:xx:xx INFO Worker: Asked to launch executor app-xxxxxxxxxxxxxx-xxxx/xxx for my-app
xx/xx/xx xx:xx:xx INFO SecurityManager: Changing view acls to: ubuntu
xx/xx/xx xx:xx:xx INFO SecurityManager: Changing modify acls to: ubuntu
xx/xx/xx xx:xx:xx INFO SecurityManager: Changing view acls groups to: 
xx/xx/xx xx:xx:xx INFO SecurityManager: Changing modify acls groups to: 
xx/xx/xx xx:xx:xx INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ubuntu); groups with view permissions: Set(); users  with modify permissions: Set(ubuntu); groups with modify permissions: Set()
xx/xx/xx xx:xx:xx INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-openjdk-amd64/jre//bin/java" "-cp" "/usr/local/share/spark/spark-2.1.1-bin-hadoop2.7/conf/:/usr/local/share/spark/spark-2.1.1-bin-hadoop2.7/jars/*" "-Xmx4096M" "-Dspark.driver.port=34889" "-Dspark.cassandra.connection.port=9042" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@127.0.0.1:34889" "--executor-id" "476" "--hostname" "127.0.0.1" "--cores" "1" "--app-id" "app-xxxxxxxxxxxxxx-xxxx" "--worker-url" "spark://Worker@127.0.0.1:42524"

如果你愿意，here's a Gist包含我上面的日志行。

如果我尝试以下基本配置，我有0个错误，但我的应用程序只是挂起，服务器真的什么也没做。没有CPU / RAM利用率。

new SparkConf()
              .setAppName("my-app")
              .setMaster("spark://xx.xxx.xxx.xxx:7077")

开/etc/hosts我将主设备和从设备设置为同一个ip。服务器和2.11.6上的Scala版本build.sbt。服务器和2.1.1上的Spark版本build.sbt。

以下是一些Spark-UI屏幕：

所以，我想：

从我的电脑启动任务
在服务器上处理该任务
在我的电脑上获得结果

我猜，这可能是一个糟糕的资源配置吗？如果没有，可能是什么导致了这个？我应该如何调整配置以避免此类问题？。

如果您需要更多详细信息，请询问。

Answer 1

由于我希望我的个人计算机进行编排，我更改了配置，将其设置为主服务器和服务器作为执行程序。

所以，我的 conf / spark-env.sh 将是：

# Options read by executors and drivers running inside the cluster
SPARK_LOCAL_IP=localhost #o set the IP address Spark binds to on this node
SPARK_PUBLIC_DNS=xx.xxx.xxx.xxx #PUBLIC SERVER IP

<强> CONF /从站：

# A Spark Worker will be started on each of the machines listed below.
xx.xxx.xxx.xxx #PUBLIC SERVER IP

<强>的/ etc /主机：

xx.xxx.xxx.xxx master #PUBLIC SERVER IP
xx.xxx.xxx.xxx slave  #PUBLIC SERVER IP

最后Scala配置将是：

.setMaster("local[*]")
.set("spark.executor.host", "xx.xxx.xxx.xxx") //Public Server IP
.set("spark.executor.memory","16g")

无法从Spark master启动Worker：退出代码1 exitStatus 1

1 个答案: