我最近从Datastax升级4.6.3 => 4.7,现在我无法运行Spark。问题似乎是Spark Master未正确配置。我使用OpsCenter 5.1.3,并启动了一个三节点Analytics集群。奇怪的是,节点最初的设置SPARK_ENABLED = 0,我必须手动将其设置为1。但是,现在没有正确配置spark master。在/var/log/cassandra/system.log中,我获得了很长的输出:
[SPARK-WORKER-INIT-0] 2015-06-13 21:59:54,027 SparkWorkerRunner.java:49 - Spark Master not ready at (no configured master)
INFO [SPARK-WORKER-INIT-0] 2015-06-13 21:59:55,028 SparkWorkerRunner.java:49 - Spark Master not ready at (no configured master)
INFO [SPARK-WORKER-INIT-0] 2015-06-13 21:59:56,028 SparkWorkerRunner.java:49 - Spark Master not ready at (no configured master)
我尝试运行dse spark,我收到以下错误:
java.io.IOException: Spark Master address cannot be retrieved. This really should not be happening with DSE 4.7+ unless your cluster is over 50% down or booted up in the last minute.
at com.datastax.bdp.plugin.SparkPlugin.getMasterAddress(SparkPlugin.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.StandardMBe
My Analytics DC已经启动了几天,并且没有启动节点。这个问题在最近几天阻碍了开发,我正在考虑降级到DSE 4.6.3,这样我就可以再次运行我的spark工作了。任何帮助都表示赞赏。
更新:
我正在研究50%的分析节点需要启动才能启动spark master的情况。在dse启动时检查system.log之后,我注意到Gossip似乎仍然认为一些旧节点是集群的一部分,而DOWN。例如,
INFO [GossipStage:1] 2015-06-14 03:18:05,587 Gossiper.java:968 - InetAddress /172.31.23.17 is now DOWN
INFO [GossipStage:1] 2015-06-14 03:18:05,614 Gossiper.java:968 - InetAddress /172.31.16.58 is now DOWN
INFO [GossipStage:1] 2015-06-14 03:18:05,647 Gossiper.java:968 - InetAddress /172.31.24.25 is now DOWN
INFO [GossipStage:1] 2015-06-14 03:18:05,687 Gossiper.java:968 - InetAddress /172.31.24.147 is now DOWN
这些是我之前离线的节点。我已经清除了这些节点的system.peers表,但是Gossip似乎仍然承认它们是集群的一部分。虚拟存在这些节点会使集群超过50%。但是,清除八卦表需要完全关闭群集。