Apache Ignite容器因失败而重新启动:拓扑未初始化

时间:2020-05-22 14:28:44

标签: ignite

我使用Ignite Helm chart stable/ignite version 2.7.6在Kubernetes上设置了一个Ignite集群。

但是很快我会收到类似这样的错误:

JVM will be halted immediately due to the fail ure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateExce ption: Topology is not initialized: app-profiles]]

结果,Ignite Kubernetes吊舱一次又一次重新启动。

相关的缓存app-profiles的配置如下:

<bean class="org.apache.ignite.configuration.CacheConfiguration">
   <property name="name" value="app-profiles" />
   <property name="cacheMode" value="LOCAL" />
   <property name="onheapCacheEnabled" value="true" />
   <property name="evictionPolicy">
      <bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicy">
         <property name="maxSize" value="10000" />
      </bean>
   </property>
   <property name="expiryPolicyFactory">
      <bean id="expiryPolicy" class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
         <constructor-arg>
            <bean class="javax.cache.expiry.Duration">
               <constructor-arg value="SECONDS" />
               <constructor-arg value="43200" />
            </bean>
         </constructor-arg>
      </bean>
   </property>
</bean>

完整堆栈跟踪:

[11:56:56,148][SEVERE][client-connector-#60][ClientListenerNioListener] Failed to process client request [req=o.a.i.i.processors.platform.client.cache.ClientCachePutRequest@1d
4811d1]
java.lang.IllegalStateException: Topology is not initialized: app-profiles
        at org.apache.ignite.internal.processors.cache.CacheGroupContext.topology(CacheGroupContext.java:587)
        at org.apache.ignite.internal.processors.cache.GridCacheContext.topology(GridCacheContext.java:882)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2179)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845)
        at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
        at org.apache.ignite.internal.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:888)
        at org.apache.ignite.internal.processors.cache.GridCacheGateway.leaveNoLock(GridCacheGateway.java:240)
        at org.apache.ignite.internal.processors.cache.GridCacheGateway.leave(GridCacheGateway.java:225)
        at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onLeave(GatewayProtectedCacheProxy.java:1578)
        at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:823)
        at org.apache.ignite.internal.processors.platform.client.cache.ClientCachePutRequest.process(ClientCachePutRequest.java:43)
        at org.apache.ignite.internal.processors.platform.client.ClientRequestHandler.handle(ClientRequestHandler.java:57)
        at org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:162)
        at org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:45)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridNioAsyncNotifyFilter$3.body(GridNioAsyncNotifyFilter.java:97)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at org.apache.ignite.internal.util.worker.GridWorkerPool$1.run(GridWorkerPool.java:70)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[11:56:56,151][SEVERE][ttl-cleanup-worker-#41][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=f
alse, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_
TERMINATION, err=java.lang.IllegalStateException: Topology is not initialized: app-profiles]]
java.lang.IllegalStateException: Topology is not initialized: app-profiles
        at org.apache.ignite.internal.processors.cache.CacheGroupContext.topology(CacheGroupContext.java:587)
        at org.apache.ignite.internal.processors.cache.GridCacheContext.topology(GridCacheContext.java:882)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2179)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845)
        at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
        at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
[11:56:56,152][WARNING][ttl-cleanup-worker-#41][FailureProcessor] No deadlocked threads detected.
[11:56:56,210][WARNING][ttl-cleanup-worker-#41][FailureProcessor] Thread dump at 2020/05/22 11:56:56 GMT
Thread [name="Thread-32", id=708, state=TIMED_WAITING, blockCnt=0, waitCnt=10]
    Lock [object=java.util.concurrent.SynchronousQueue$TransferStack@27a8c714, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#650", id=707, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@43cf0185, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#649", id=706, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@43cf0185, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

...

Thread [name="Signal Dispatcher", id=4, state=RUNNABLE, blockCnt=0, waitCnt=0]

Thread [name="Finalizer", id=3, state=WAITING, blockCnt=35, waitCnt=21]
    Lock [object=java.lang.ref.ReferenceQueue$Lock@27fdc7bd, ownerName=null, ownerId=-1]
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)

Thread [name="Reference Handler", id=2, state=WAITING, blockCnt=21, waitCnt=21]
    Lock [object=java.lang.ref.Reference$Lock@2cfe1952, ownerName=null, ownerId=-1]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

Thread [name="main", id=1, state=WAITING, blockCnt=1, waitCnt=107]
    Lock [object=java.util.concurrent.CountDownLatch$Sync@73b574bf, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at o.a.i.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:334)



[11:56:56,214][SEVERE][ttl-cleanup-worker-#41][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.I
llegalStateException: Topology is not initialized: app-profiles]]

1 个答案:

答案 0 :(得分:1)

发现问题出在cacheMode LOCAL上。以某种方式,LOCAL模式下的缓存无法初始化其拓扑(Ignite版本2.7.6)。只需将其替换为PARTITIONED cacheMode,问题就消失了,如下所示:

<bean class="org.apache.ignite.configuration.CacheConfiguration">
   <property name="name" value="app-profiles" />
   <!-- <property name="cacheMode" value="LOCAL" /> -->
   <property name="cacheMode" value="PARTITIONED" />
   <property name="onheapCacheEnabled" value="true" />
   <property name="evictionPolicy">
      <bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicy">
         <property name="maxSize" value="10000" />
      </bean>
   </property>
   <property name="expiryPolicyFactory">
      <bean id="expiryPolicy" class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
         <constructor-arg>
            <bean class="javax.cache.expiry.Duration">
               <constructor-arg value="SECONDS" />
               <constructor-arg value="43200" />
            </bean>
         </constructor-arg>
      </bean>
   </property>
</bean>

并且cacheMode REPLICATED也应该起作用。