我需要帮助正确诊断com.hazelcast.core.OperationTimeoutException。
com.hazelcast.core.OperationTimeoutException: 没有响应120000毫秒。中止调用! 调用{serviceName =' hz:impl:mapService',op = GetOperation {TRADES},partitionId = 87,replicaIndex = 0,tryCount = 250,tryPauseMillis = 500,调用 Count = 1,callTimeout = 60000,target = Address [10.32.21.170]:17326,backupsExpected = 0,backupsCompleted = 0}
未收到任何回复! backup-expected:0 backup-completed:0
看起来120,000毫秒是可配置的,但我不认为增加这个是答案。 发生这种情况时,无论是获取还是设置操作等,所有调用都会因同样的原因而失败。
是否可以提供有关应调整哪些参数以缓解此问题的建议?也许它实际上是一个线程争用问题,增加事件线程或喜欢可能会有所帮助。 hazelcast实例目前没有自定义参数。线程计数都是默认值。在此期间,服务器也没有过多的垃圾收集。
答案 0 :(得分:2)
Most probable cause of this exception a network problem among cluster members. An unresponsive node (because of memory or GC problems etc) can also cause such issue. First thing is can be to ensure quality/performance of your network env. If you are using AWS, you can prefer instance with better network performance.
If you want to get rid of problematic nodes quickly; you can set a lower value for following system property: "hazelcast.max.no.heartbeat.seconds" : Maximum timeout for heartbeat in seconds for a node to assume it is dead. Default is 500 seconds.