flink HA独立群集失败

时间:2018-12-20 13:39:43

标签: akka cluster-computing apache-flink high-availability

 

2台计算机,203,204
两者都在每台计算机上运行jobmanagertaskmanager

大师
hz203:9081
hz204:9081
奴隶
hz203
hz204
flink-conf.yaml
jobmanager.rpc.port: 6123
rest.port: 9081
blob.server.port: 6124
query.server.port: 6125
web.tmpdir: /home/ctu/flink/deploy/webTmp
web.log.path: /home/ctu/flink/deploy/log
taskmanager.tmp.dirs: /home/ctu/flink/deploy/taskManagerTmp
high-availability: zookeeper
high-availability.storageDir: file:///home/ctu/flink/deploy/HA
high-availability.zookeeper.quorum: 10.0.1.79:2181
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: /flink
运行./start-cluster.sh
Starting HA cluster with 2 masters.
Starting standalonesession daemon on host hz203.
Starting standalonesession daemon on host hz204.
Starting taskexecutor daemon on host hz203.
Starting taskexecutor daemon on host hz204.
日志
2018-12-20 20:44:03,843 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/rest_server_lock'}.
2018-12-20 20:44:03,864 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://127.0.0.1:9081.
2018-12-20 20:44:03,875 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
2018-12-20 20:44:03,989 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2018-12-20 20:44:03,999 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.
2018-12-20 20:44:04,008 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2018-12-20 20:44:04,009 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.
2018-12-20 20:44:04,010 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
2018-12-20 20:44:04,206 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
2018-12-20 20:44:04,221 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:43012] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:43012]] Caused by: [Connection refused: /127.0.0.1:43012]
2018-12-20 20:44:04,301 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
2018-12-20 20:44:04,301 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:43012] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:43012]] Caused by: [Connection refused: /127.0.0.1:43012]
2018-12-20 20:44:04,378 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
2018-12-20 20:44:04,378 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:43012] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:43012]] Caused by: [Connection refused: /127.0.0.1:43012]
2018-12-20 20:44:04,451 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
2018-12-20 20:44:04,451 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:43012] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:43012]] Caused by: [Connection refused: /127.0.0.1:43012]
2018-12-20 20:44:04,520 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
问题
`akka.tcp://flink@127.0.0.1:33567/user/resourcemanager` --- Why the 127.0.0.1 instead of the `jobmanager` ip in the `masters's` config file?

1 个答案:

答案 0 :(得分:0)

问题是我们在1.6.1版中修复了一个错误。在1.6.0中,我们不尊重方法--host中的ClusterEntrypoint#loadConfiguration命令行选项,因为与here相比,您可以看到code of version 1.6.1

因此,升级到最新的1.6.x版本应该可以解决此问题。通常,如果可能的话,我总是建议升级到发行版的最新错误修复版本。