我有Ignite 2.7和5节点群集。超过40Mil的数据正在生成并存储在点火缓存中。我已设定3天有效期。今天,一个点火节点停止并显示以下错误。请帮助我确定并解决问题。
[2019-09-11 07:45:59,570] [错误] [ttl-cleanup-worker-#170] [root]检测到严重系统错误。将根据配置的处理程序进行相应处理[hnd = StopNodeOrHaltFailureHandler [tryStop = false,超时= 0,super = AbstractFailureHandler [ignoredFailureTypes = [SYSTEM_WORKER_BLOCKED]]]],failureCtx = FailureContext [type = SYSTEM_WORKER_TERMINATION,err = Exception.java.lang.IllegalState类型:1页面ID:000102210006d4ac]] java.lang.IllegalStateException:未知页面类型:1 pageId:000102210006d4ac 在org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.io(BPlusTree.java:5058) 在org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access $ 200(BPlusTree.java:90) 在org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree $ AbstractForwardCursor.nextPage(BPlusTree.java:5330) 在org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree $ ForwardCursor.next(BPlusTree.java:5566) 在org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager $ GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2232) 在org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager $ GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157) 在org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845) 在org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207) 在org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager $ CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139) 在org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 在java.lang.Thread.run(Thread.java:748) [2019-09-11 07:45:59,575] [WARN] [ttl-cleanup-worker-#170] [FailureProcessor]未检测到死锁线程。 [2019-09-11 07:46:40,831] [WARN] [jvm-pause-detector-worker] [IgniteKernal] JVM暂停时间可能太长:41233毫秒。 [2019-09-11 07:46:40,831] [错误] [sys-stripe-0-#1] [G]已检测到阻塞的系统关键线程。这可能导致群集范围内的未定义行为[threadName = gri d-nio-worker-tcp-comm-23,blockedFor = 41s] [2019-09-11 07:46:40,832] [WARN] [sys-stripe-0-#1] [G]线程[name =“ grid-nio-worker-tcp-comm-23-#143”,id = 173,状态= RUNNABLE,blockCnt = 0,waitCnt = 0]
如果被点燃,则配置
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling native persistance-->
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="metricsEnabled" value="true"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
<property name="storagePath" value="/ignite_data/ignite/persistance"/>
<property name="walPath" value="/ignite_data/ignite/wal"/>
<property name="walArchivePath" value="/data/disk01/ignite/archive"/>
</bean>
</property>
<!-- Enable authentication for ignite-->
<property name="authenticationEnabled" value="true"/>
<!-- Enabling expiry policy -->
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="CACHE_L4_TRIGGER_NOTIFICATION"/>
<property name="expiryPolicyFactory">
<bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
<constructor-arg>
<bean class="javax.cache.expiry.Duration">
<constructor-arg value="DAYS"/>
<constructor-arg value="3"/>
</bean>
</constructor-arg>
</bean>
</property>
</bean>
</list>
</property>
<!-- Enable Ignite matric logged into logs in every 10 min-->
<property name="gridLogger">
<bean class="org.apache.ignite.logger.log4j.Log4JLogger">
<constructor-arg type="java.lang.String" value="/home/trigger_be/apache-ignite-2.7.0/config/log4j.xml"/>
</bean>
</property>
<property name="metricsLogFrequency" value="#{60 * 10 * 1000}"/>
<!-- Set Cluster by giving IPs-->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<value>172.16.5.36:49500..49509</value>
<value>172.16.5.37:49500..49509</value>
<value>172.16.5.38:49500..49509</value>
<value>172.16.5.39:49500..49509</value>
<value>172.16.5.40:49500..49509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
答案 0 :(得分:0)
这看起来像是数据损坏问题。建议从该节点中完全删除持久性数据,然后将其重新添加到群集的基线拓扑中。如果您有足够的备份,那么数据将重新平衡。
这看起来像问题IGNITE-10767。是否启用了MVCC(事务SQL,TRANSACTIONAL_SNAPSHOT缓存)?