新节点加入或释放时,Ignite Cluster卡住

时间:2018-06-04 05:20:14

标签: ignite in-memory-database gridgain

我有3个节点集群,20个客户端,并且它在spark上下文中运行。最初它工作正常,但每当新节点即客户端尝试与集群连接时随机出现问题。集群无法运行。我有当它卡住时跟踪日志。如果我明确地重新启动任何Ignite服务器,那么它的发布和工作正常。我使用Ignite 2.4.0版本。同样的问题也出现在Ignite 2.5.0版本中。

客户端日志 无法等待分区映射交换[topVer = AffinityTopologyVersion [topVer = 44,minorTopVer = 0],node = 4d885cfd-45ed-43a2-8088-f35c9469797f]。转储可能是原因的待处理对象:

        GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo, 10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44, intOrder=0, lastExchangeTime=1527651620413, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]

无法等待分区地图交换[topVer = AffinityTopologyVersion [topVer = 44,minorTopVer = 0],node = 4d885cfd-45ed-43a2-8088-f35c9469797f]。转储可能是原因的待处理对象:

        GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo, 10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44, intOrder=0, lastExchangeTime=1527651620413, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]

无法等待初始分区映射交换。可能的原因是: ^ - 死锁中的事务。 ^ - 长时间运行的事务(如果是这种情况则忽略)。 ^ - 未发布的显式锁。

仍在等待初始分区映射交换[fut = GridDhtPartitionsExchangeFuture [firstDiscoEvt = DiscoveryEvent [evtNode = TcpDiscoveryNode [id = 4d885cfd-45ed-43a2-8088-f35c9469797f,addrs =

服务器端日志 条纹池中可能存在饥饿现象。线程名称:sys-stripe-0-#1队列:[消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtTxPrepareResponse [nearEvicted] = null,futId = 869dd4ca361-fe7e167d-4d80-4f57-b004-13359a9f2c11,miniId = 1,super = GridDistributedTxPrepareResponse [txState = null,part = -1,err = null,super = GridDistributedBaseMessage [ver = GridCacheVersion [topVer = 139084030, order = 1527604094903,nodeOrder = 1],committedVers = null,rolledbackVers = null,cnt = 0,super = GridCacheIdMessage [cacheId = 0]]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE, topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtAtomicSingleUpdateRequest [key = KeyCacheObjectImpl [part = 984,val = null,hasValBytes = true],val = BinaryObjectImpl [arr = true,ctx = false,start = 0],prevVal = null,super = GridDhtAtomicAbstractUpdateRequest [onRes = false,nearNodeId = null,nearFutId = 0,flags =]]]],oaiiprocessors.cache.distributed.dht.atomic.GridDhtAtomicCache $ D eferredUpdateTimeout @ 2735c674,消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtTxPrepareRequest [nearNodeId = 628e3078-17fd-4e49-b9ae-ad94ad97a2f1 ,futId = 6576e4ca361-6e7cdac2-d5a3-4624-9ad3-b93f25546cc3,miniId = 1,topVer = AffinityTopologyVersion [topVer = 20,minorTopVer = 0],invalidateNearEntries = {},nearWrites = null,owned = null,nearXidVer = GridCacheVersion [topVer = 139084030,order = 1527604094933,nodeOrder = 2],subjId = 628e3078-17fd-4e49-b9ae-ad94ad97a2f1,taskNameHash = 0,preloadKeys = null,super = GridDistributedTxPrepareRequest [threadId = 86,concurrency = OPTIMISTIC,isolation = READ_COMMITTED,writeVer = GridCacheVersion [topVer = 139084030,order = 1527604094935,nodeOrder = 2],timeout = 0,reads = null,writes = [IgniteTxEntry [key = BinaryObjectImpl [arr = true,ctx = false,start = 0],cacheId = -1755241537, txKey = null,val = [op = UPDATE,val = BinaryObjectImpl [arr = true,ctx = false,start = 0]],prevVal = [op = NOOP,val = null],oldVal = [op = NOO P,val = null],entryProcessorsCol = null,ttl = -1,conflictExpireTime = -1,conflictVer = null,explicitVer = null,dhtVer = null,filters = null,filtersPassed = false,filtersSet = false,entry = null,prepared = 0,locked = false,nodeId = null,locMapped = false,expiryPlc = null,transferExpiryPlc = false,flags = 0,partUpdateCntr = 0,serReadVer = null,xidVer = null]],dhtVers = null,txSize = 0,plc = 2,txState = null,flags = onePhase | last,super = GridDistributedBaseMessage [ver = GridCacheVersion [topVer = 139084030,order = 1527604094933,nodeOrder = 2],committedVers = null,rolledbackVers = null,cnt = 0,super = GridCacheIdMessage [ cacheId = 0]]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtAtomicDeferredUpdateResponse [futIds = GridLongList [idx] = 2,arr = [65774,65775]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridNearAtomicSingleUpdateRequest [关键= KeyCacheObjectImp l [part = 1016,val = null,hasValBytes = true],parent = GridNearAtomicAbstractSingleUpdateRequest [nodeId = null,futId = 49328,topVer = AffinityTopologyVersion [topVer = 20,minorTopVer = 0],parent = GridNearAtomicAbstractUpdateRequest [res = null,flags = needRes]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtAtomicDeferredUpdateResponse [futIds = GridLongList [idx = 1, arr = [98591]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtAtomicDeferredUpdateResponse [futIds = GridLongList [idx] = 1,arr = [114926]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridNearAtomicSingleUpdateRequest [key = KeyCacheObjectImpl [part = 1016,val = null,hasValBytes = true],parent = GridNearAtomicAbstractSingleUpdateRequest [nodeId = null,futId = 32946,topVer = AffinityTopologyVersion [t opVer = 20,minorTopVer = 0],parent = GridNear

0 个答案:

没有答案