在zookeeper升级期间flink作业管理器是否会崩溃?

时间:2021-02-09 14:24:42

标签: apache-flink

我试图了解在zookeeper升级期间flink jobmanager的行为是否符合预期。

我在 kubernetes 中运行 flink 1.11.2,使用 zookeeper 服务器 3.5.4-beta。 当我在升级 zookeeper 时,zookeeper 有 20 秒的停机时间。我希望在这 20 秒内重新启动 flink 作业或在日志中很少出现警告。相反,我看到整个 flink JVM 崩溃(然后 pod 重新启动)。

我希望 flink 在内部重试 zookeeper 请求,所以我很惊讶它崩溃了。这是预期的,还是错误?

来自日志

org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
[09-Feb-2021 11:30:00.197 UTC] INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server zdzk.servicexxx/192.168.190.92:2181
[09-Feb-2021 11:30:00.197 UTC] INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Socket connection established to zdzk.servicexxx/192.168.190.92:2181, initiating session
[09-Feb-2021 11:30:00.198 UTC] WARN org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Session 0x3012b0057140004 for server zdzk.servicexxx/192.168.190.92:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_192]
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_192]
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_192]
    at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_192]
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_192]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
[09-Feb-2021 11:30:02.294 UTC] INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server zdzk.servicexxx/192.168.190.92:2181
[09-Feb-2021 11:30:02.295 UTC] INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Socket connection established to zdzk.servicexxx/192.168.190.92:2181, initiating session
[09-Feb-2021 11:30:02.295 UTC] WARN org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Session 0x3012b0057140004 for server zdzk.servicexxx/192.168.190.92:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_192]
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_192]
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_192]
    at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_192]
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_192]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
[09-Feb-2021 11:30:03.841 UTC] INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server zdzk.servicexxx/192.168.190.92:2181
[09-Feb-2021 11:30:03.842 UTC] INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Socket connection established to zdzk.servicexxx/192.168.190.92:2181, initiating session
[09-Feb-2021 11:30:03.842 UTC] WARN org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Session 0x3012b0057140004 for server zdzk.servicexxx/192.168.190.92:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_192]
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_192]
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_192]
    at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_192]
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_192]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
[09-Feb-2021 11:30:04.175 UTC] ERROR org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl [] - Background operation retry gave up
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
[09-Feb-2021 11:30:04.176 UTC] ERROR org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] - Received error from LeaderRetrievalService.
org.apache.flink.util.FlinkException: Unhandled error in ZooKeeperLeaderRetrievalService:Background operation retry gave up
    at org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService.unhandledError(ZooKeeperLeaderRetrievalService.java:208) [flink-dist_2.11-1.11.2.jar:1.11.2]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:713) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:709) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.logError(CuratorFrameworkImpl.java:708) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:874) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    ... 10 more
[09-Feb-2021 11:30:04.178 UTC] ERROR org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl      [] - Leader Election Service encountered a fatal error.
org.apache.flink.util.FlinkException: Unhandled error in ZooKeeperLeaderElectionService: Background operation retry gave up
    at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService.unhandledError(ZooKeeperLeaderElectionService.java:430) [flink-dist_2.11-1.11.2.jar:1.11.2]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:713) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:709) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.logError(CuratorFrameworkImpl.java:708) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:874) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    ... 10 more
[09-Feb-2021 11:30:04.179 UTC] ERROR org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] - Received error from LeaderRetrievalService.
org.apache.flink.util.FlinkException: Unhandled error in ZooKeeperLeaderRetrievalService:Background operation retry gave up
    at org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService.unhandledError(ZooKeeperLeaderRetrievalService.java:208) [flink-dist_2.11-1.11.2.jar:1.11.2]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:713) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:709) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.logError(CuratorFrameworkImpl.java:708) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:874) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    ... 10 more
[09-Feb-2021 11:30:04.180 UTC] ERROR org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Fatal error occurred in ResourceManager.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Received an error from the LeaderElectionService.
    at org.apache.flink.runtime.resourcemanager.ResourceManager.handleError(ResourceManager.java:1053) [flink-dist_2.11-1.11.2.jar:1.11.2]
    at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService.unhandledError(ZooKeeperLeaderElectionService.java:430) [flink-dist_2.11-1.11.2.jar:1.11.2]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:713) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:709) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.logError(CuratorFrameworkImpl.java:708) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:874) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: org.apache.flink.util.FlinkException: Unhandled error in ZooKeeperLeaderElectionService: Background operation retry gave up
    ... 18 more
Caused by: org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    ... 10 more
[09-Feb-2021 11:30:04.181 UTC] ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error occurred in the cluster entrypoint.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Received an error from the LeaderElectionService.
    at org.apache.flink.runtime.resourcemanager.ResourceManager.handleError(ResourceManager.java:1053) [flink-dist_2.11-1.11.2.jar:1.11.2]
    at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService.unhandledError(ZooKeeperLeaderElectionService.java:430) [flink-dist_2.11-1.11.2.jar:1.11.2]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:713) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:709) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.logError(CuratorFrameworkImpl.java:708) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:874) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_192]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: org.apache.flink.util.FlinkException: Unhandled error in ZooKeeperLeaderElectionService: Background operation retry gave up
    ... 18 more
Caused by: org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    at org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
    ... 10 more
[09-Feb-2021 11:30:04.196 UTC] INFO org.apache.flink.runtime.blob.BlobServer                     [] - Stopped BLOB server at 0.0.0.0:6124

2 个答案:

答案 0 :(得分:0)

如果在升级过程中维持了zookeeper quorum,那么Flink作业管理器应该不会受到影响。否则,作业管理器会失败也就不足为奇了。

通常你会先升级zookeeper的追随者,然后再升级领导者。在关闭另一个节点之前验证是否已重新建立仲裁。

答案 1 :(得分:0)