我正在以集群模式在独立的测试Spark上运行作业,但是我发现自己无法监视驱动程序的状态。
这是一个使用spark-2.4.3的最小示例(主节点和一个工作节点在同一个节点上运行,并开始使用默认的conf在freshly unarchived installation上运行sbin/start-all.sh
,不使用conf/slaves
设置),从节点本身执行spark-submit
:
> spark-submit --master spark://ip-172-31-15-245:7077 --deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
/home/ubuntu/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/06/27 09:08:28 INFO SecurityManager: Changing view acls to: ubuntu
19/06/27 09:08:28 INFO SecurityManager: Changing modify acls to: ubuntu
19/06/27 09:08:28 INFO SecurityManager: Changing view acls groups to:
19/06/27 09:08:28 INFO SecurityManager: Changing modify acls groups to:
19/06/27 09:08:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set()
19/06/27 09:08:28 INFO Utils: Successfully started service 'driverClient' on port 36067.
19/06/27 09:08:28 INFO TransportClientFactory: Successfully created connection to ip-172-31-15-245/172.31.15.245:7077 after 29 ms (0 ms spent in bootstraps)
19/06/27 09:08:28 INFO ClientEndpoint: Driver successfully submitted as driver-20190627090828-0008
19/06/27 09:08:28 INFO ClientEndpoint: ... waiting before polling master for driver state
19/06/27 09:08:33 INFO ClientEndpoint: ... polling master for driver state
19/06/27 09:08:33 INFO ClientEndpoint: State of driver-20190627090828-0008 is RUNNING
19/06/27 09:08:33 INFO ClientEndpoint: Driver running on 172.31.15.245:41057 (worker-20190627083412-172.31.15.245-41057)
19/06/27 09:08:33 INFO ShutdownHookManager: Shutdown hook called
19/06/27 09:08:33 INFO ShutdownHookManager: Deleting directory /tmp/spark-34082661-f0de-4c56-92b7-648ea24fa59c
> spark-submit --master spark://ip-172-31-15-245:7077 --status driver-20190627090828-0008
19/06/27 09:09:27 WARN RestSubmissionClient: Unable to connect to server spark://ip-172-31-15-245:7077.
Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:165)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:148)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.deploy.rest.RestSubmissionClient.requestSubmissionStatus(RestSubmissionClient.scala:148)
at org.apache.spark.deploy.SparkSubmit.requestStatus(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:88)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.deploy.rest.SubmitRestConnectionException: No response from server
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:285)
at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$get(RestSubmissionClient.scala:195)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:152)
... 11 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:278)
... 13 more
Spark状况良好(我可以执行上述工作之后执行其他作业), driver-20190627090828-0008 在网络用户界面中显示为“完成”。 有什么我想念的吗?
用于查询状态的端口与提交作业的端口是否正确?
更新: 在主日志上,我得到的只是
19/07/01 09:40:24 INFO master.Master: 172.31.15.245:42308 got disassociated, removing it.