连接到Mesos的Spark-shell卡在sched.cpp上

时间:2015-11-16 01:19:25

标签: apache-spark mesos

以下是我的spark-defaults.confspark-shell

的输出
$ cat conf/spark-defaults.conf
spark.master                     mesos://172.16.**.***:5050
spark.eventLog.enabled           false
spark.broadcast.compress         false
spark.driver.memory              4g
spark.executor.memory            4g
spark.executor.instances         1

$ bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
15/11/15 04:56:11 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
I1115 04:56:12.171797 72994816 sched.cpp:164] Version: 0.25.0
I1115 04:56:12.173741 67641344 sched.cpp:262] New master detected at master@172.16.**.***:5050
I1115 04:56:12.173951 67641344 sched.cpp:272] No credentials provided. Attempting to register without authentication

它无限期地悬挂在这里,而Mesos Web UI显示许多Spark框架正在旋转 - 连续注册和取消注册,直到我用Ctrl-C退出spark-shell

Mesos Web UI

我怀疑这部分原因是我的笔记本电脑有多个ip地址。在服务器上运行时,它继续到下一行,通常是Scala REPL:

I1116 09:53:30.265967 29327 sched.cpp:641] Framework registered with 9d725348-931a-48fb-96f7-d29a4b09f3e8-0242
15/11/16 09:53:30 INFO mesos.MesosSchedulerBackend: Registered as framework ID 9d725348-931a-48fb-96f7-d29a4b09f3e8-0242
15/11/16 09:53:30 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 57810.
15/11/16 09:53:30 INFO netty.NettyBlockTransferService: Server created on 57810
15/11/16 09:53:30 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/11/16 09:53:30 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.16.**.***:57810 with 2.1 GB RAM, BlockManagerId(driver, 172.16.**.***, 57810)
15/11/16 09:53:30 INFO storage.BlockManagerMaster: Registered BlockManager
15/11/16 09:53:30 INFO repl.Main: Created spark context..
Spark context available as sc.

我正在运行由Mesosphere构建的Mesos 0.25.0,并且我将spark.driver.host设置为可从Mesos群集中的所有计算机访问的地址。我看到spark-shell进程打开的每个端口都绑定到该IP地址或*

The most similar question on StackOverflow似乎没有帮助,因为在这种情况下我的笔记本电脑应该可以从主机访问。

我无法找到可能包含框架未注册原因的日志文件。我应该在哪里寻找解决此问题的方法?

1 个答案:

答案 0 :(得分:5)

Mesos有一个关于网络如何工作的非常奇怪的概念 - 特别是,您可以在Master和Framework之间建立双向通信。所以双方都需要有一条共同的网络路线。如果您在NAT或容器后面运行,那么您之前已经遇到过这种情况 - 通常您需要将strcmp()设置为框架端的公共可访问IP。也许这适用于多宿主系统,就像你的笔记本电脑一样。

您可以在互联网上找到更多信息,但遗憾的是,没有详细记录。但是有a hint on their Deployment Scripts page