在机器上启动Spark jar时会启动HTTP服务器,那是什么?

时间:2014-06-10 09:16:19

标签: scala apache-spark

我想使用机器A,我将我的Spark作业提交到集群,A没有Spark环境,只有java。当我启动jar时,会启动一个HTTP服务器:

[steven@bj-230 ~]$ java -jar helloCluster.jar SimplyApp
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/06/10 16:54:54 INFO SparkEnv: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/06/10 16:54:54 INFO SparkEnv: Registering BlockManagerMaster
14/06/10 16:54:54 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140610165454-4393
14/06/10 16:54:54 INFO MemoryStore: MemoryStore started with capacity 1055.1 MB.
14/06/10 16:54:54 INFO ConnectionManager: Bound socket to port 59981 with id = ConnectionManagerId(bj-230,59981)
14/06/10 16:54:54 INFO BlockManagerMaster: Trying to register BlockManager
14/06/10 16:54:54 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager bj-230:59981 with 1055.1 MB RAM
14/06/10 16:54:54 INFO BlockManagerMaster: Registered BlockManager
14/06/10 16:54:54 INFO HttpServer: Starting HTTP Server
14/06/10 16:54:54 INFO HttpBroadcast: Broadcast server started at http://10.10.10.230:59233
14/06/10 16:54:54 INFO SparkEnv: Registering MapOutputTracker
14/06/10 16:54:54 INFO HttpFileServer: HTTP File server directory is /tmp/spark-bfdd02f1-3c02-4233-854f-af89542b9acf
14/06/10 16:54:54 INFO HttpServer: Starting HTTP Server
14/06/10 16:54:54 INFO SparkUI: Started Spark Web UI at http://bj-230:4040
14/06/10 16:54:54 INFO SparkContext: Added JAR hdfs://master:8020/tmp/helloCluster.jar at hdfs://master:8020/tmp/helloCluster.jar with timestamp 1402390494838
14/06/10 16:54:54 INFO AppClient$ClientActor: Connecting to master spark://master:7077...

那么,这台服务器的含义是什么?如果我在NAT后面,是否可以使用这台机器A将我的工作提交给远程集群?

顺便说一下,执行结果失败了。错误日志:

14/06/10 16:55:05 INFO SparkDeploySchedulerBackend: Executor app-20140610165321-0005/7 removed: Command exited with code 1
14/06/10 16:55:05 ERROR AppClient$ClientActor: Master removed our application: FAILED; stopping client
14/06/10 16:55:05 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
14/06/10 16:55:11 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

1 个答案:

答案 0 :(得分:3)

spark驱动程序启动了几个HTTP端点:

  • 它提供了一个显示作业进度的Web控制台。此http端点的默认端口为4040,可以使用配置选项spark.ui.port进行更改。然后,您使用浏览器连接到它:http://your_host:4040,您就可以完成这项工作了。它只在司机跑的时候活着。

  • 还有一个额外的HTTP端点,用于为声明为依赖项的jar提供文件下载服务。工作人员将联系驱动程序以下载依赖项列表。这是一个随机分配的端口。因此,驱动程序必须位于Spark工作人员的可路由网络上。