我们在项目中使用了分布式Hazelcast地图,并且最近遇到了获取延迟非常高的问题。我们使用了IMap.get(...)
个调用,在某些情况下需要花费数小时才能完成。发生此事件后,我们切换到IMap.getAsync(...)
API超时,这为我们解决了问题,但是我很好奇是否有人遇到类似问题。
我们的Hazelcast版本是3.9.0。在事件期间,我们将hazelcast.operation.call.timeout.millis
设置为5000,并将async-backup-count="3"
设置为read-backup-data="true"
。由于不相关的后台处理,我们在某些主机上还偶尔出现CPU使用率高峰(几分钟内高达100%),这可能会影响Hazelcast。
我们在日志中发现的唯一可疑的事情是,事件发生后,所有主机都在抱怨一个特定的主机,例如:
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739863 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739864 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739852 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739870 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739874 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
在hostY
的日志中:
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor
WARNING: [hostY]:5702 [dev] [3.9] MonitorInvocationsTask delayed 14294 ms
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor
WARNING: [hostY]:5702 [dev] [3.9] BroadcastOperationControlTask delayed 13544 ms
有什么想法吗?
答案 0 :(得分:2)
从hostY
的日志来看,似乎hostY
遭受了GC暂停。 MonitorInvocationsTask
计划每秒钟运行一次,但它的执行被延迟了14秒。并且BroadcastOperationControlTask
应该由于您的配置(hazelcast.operation.call.timeout.millis / 4 = 1250 ms
)而几乎每秒都被调度,但类似地,它会延迟13秒。
您可以通过启用GC日志来验证这一点。另外,当内存和/或CPU使用率超过某个阈值时,Hazelcast应该定期打印HealthMonitor
日志。