我意识到当我杀死领导者动物园管理员时,主火花变得反应迟钝(当然我将领导选举任务分配给了动物园管理员)。以下是我在Master Spark节点上看到的错误日志。你有什么建议可以解决吗?
15/06/22 10:44:00 INFO ClientCnxn: Unable to read additional data from
> server sessionid 0x14dd82e22f70ef1, likely server has closed socket,
> closing socket connection and attempting reconnect
15/06/22 10:44:00
> INFO ClientCnxn: Unable to read additional data from server sessionid
> 0x24dc5a319b40090, likely server has closed socket, closing socket
> connection and attempting reconnect
15/06/22 10:44:01 INFO
> ConnectionStateManager: State change: SUSPENDED
15/06/22 10:44:01 INFO
> ConnectionStateManager: State change: SUSPENDED
15/06/22 10:44:01 WARN
> ConnectionStateManager: There are no ConnectionStateListeners
> registered.
15/06/22 10:44:01 INFO ZooKeeperLeaderElectionAgent: We
> have lost leadership
15/06/22 10:44:01 ERROR Master: Leadership has
> been revoked -- master shutting down.
答案 0 :(得分:3)
这是预期的行为。你必须设置'n'个主人,你需要在所有主env.sh中指定zookeeper url
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"
请注意,zookeeper维护仲裁。这意味着您需要拥有奇数个动物园管理员,并且只有在法定人数维持时,zookeeper群集才会启动。由于spark取决于zookeeper,因此暗示在维持zookeeper仲裁之前火花簇不会启动。
当您设置两个(n)主设备并关闭一个zookeeper时,当前主设备将关闭,新主设备将被选举,所有工作节点将连接到新主设备。
您应该通过提供
来启动您的员工./start-slave.sh spark://master1:port1,master2:port2
你要等1-2分钟!!注意这个故障转移。