点火节点掉了[ttl-cleanup-worker]

时间:2019-09-11 04:24:08

标签: ignite

我有Ignite 2.7和5节点群集。超过40Mil的数据正在生成并存储在点火缓存中。我已设定3天有效期。今天,一个点火节点停止并显示以下错误。请帮助我确定并解决问题。

  

[2019-09-11 07:45:59,570] [错误] [ttl-cleanup-worker-#170] [root]检测到严重系统错误。将根据配置的处理程序进行相应处理[hnd = StopNodeOrHaltFailureHandler [tryStop = false,超时= 0,super = AbstractFailureHandler [ignoredFailureTypes = [SYSTEM_WORKER_BLOCKED]]]],failureCtx = FailureContext [type = SYSTEM_WORKER_TERMINATION,err = Exception.java.lang.IllegalState类型:1页面ID:000102210006d4ac]]   java.lang.IllegalStateException:未知页面类型:1 pageId:000102210006d4ac           在org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.io(BPlusTree.java:5058)           在org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access $ 200(BPlusTree.java:90)           在org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree $ AbstractForwardCursor.nextPage(BPlusTree.java:5330)           在org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree $ ForwardCursor.next(BPlusTree.java:5566)           在org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager $ GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2232)           在org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager $ GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157)           在org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845)           在org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)           在org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager $ CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139)           在org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)           在java.lang.Thread.run(Thread.java:748)   [2019-09-11 07:45:59,575] [WARN] [ttl-cleanup-worker-#170] [FailureProcessor]未检测到死锁线程。   [2019-09-11 07:46:40,831] [WARN] [jvm-pause-detector-worker] [IgniteKernal] JVM暂停时间可能太长:41233毫秒。   [2019-09-11 07:46:40,831] [错误] [sys-stripe-0-#1] [G]已检测到阻塞的系统关键线程。这可能导致群集范围内的未定义行为[threadName = gri   d-nio-worker-tcp-comm-23,blockedFor = 41s]   [2019-09-11 07:46:40,832] [WARN] [sys-stripe-0-#1] [G]线程[name =“ grid-nio-worker-tcp-comm-23-#143”,id = 173,状态= RUNNABLE,blockCnt = 0,waitCnt = 0]

如果被点燃,则配置

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd">
    <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">

        <!-- Enabling native persistance-->
        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="metricsEnabled" value="true"/>
                <property name="defaultDataRegionConfiguration">
                    <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="persistenceEnabled" value="true"/>
                    </bean>
                </property>
                <property name="storagePath" value="/ignite_data/ignite/persistance"/>
                <property name="walPath" value="/ignite_data/ignite/wal"/>
                <property name="walArchivePath" value="/data/disk01/ignite/archive"/>
            </bean>
        </property>

        <!-- Enable authentication for ignite-->
		<property name="authenticationEnabled" value="true"/>


        <!-- Enabling expiry policy -->
        <property name="cacheConfiguration">
            <list>
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="CACHE_L4_TRIGGER_NOTIFICATION"/>
                    <property name="expiryPolicyFactory">
                        <bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
                            <constructor-arg>
                                <bean class="javax.cache.expiry.Duration">
                                    <constructor-arg value="DAYS"/>
                                    <constructor-arg value="3"/>
                                </bean>
                            </constructor-arg>
                        </bean>
                    </property>
                </bean>
            </list>
        </property>


        <!-- Enable Ignite matric logged into logs in every 10 min-->
        <property name="gridLogger">
            <bean class="org.apache.ignite.logger.log4j.Log4JLogger">
                <constructor-arg type="java.lang.String" value="/home/trigger_be/apache-ignite-2.7.0/config/log4j.xml"/>
            </bean>
        </property>
        <property name="metricsLogFrequency" value="#{60 * 10 * 1000}"/>

        <!-- Set Cluster by giving IPs-->
        <property name="discoverySpi">
            <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="ipFinder">
                    <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                        <property name="addresses">
                            <list>
                                <value>172.16.5.36:49500..49509</value>
                                <value>172.16.5.37:49500..49509</value>
                                <value>172.16.5.38:49500..49509</value>
                                <value>172.16.5.39:49500..49509</value>
				                <value>172.16.5.40:49500..49509</value>
                            </list>
                        </property>
                    </bean>
                </property>
            </bean>
        </property>
    </bean>
</beans>

1 个答案:

答案 0 :(得分:0)

这看起来像是数据损坏问题。建议从该节点中完全删除持久性数据,然后将其重新添加到群集的基线拓扑中。如果您有足够的备份,那么数据将重新平衡。

这看起来像问题IGNITE-10767。是否启用了MVCC(事务SQL,TRANSACTIONAL_SNAPSHOT缓存)?