Hazelcast分布式地图获取操作的延迟非常高

时间:2018-11-06 17:17:39

标签: hazelcast

我们在项目中使用了分布式Hazelcast地图,并且最近遇到了获取延迟非常高的问题。我们使用了IMap.get(...)个调用,在某些情况下需要花费数小时才能完成。发生此事件后,我们切换到IMap.getAsync(...) API超时,这为我们解决了问题,但是我很好奇是否有人遇到类似问题。

我们的Hazelcast版本是3.9.0。在事件期间,我们将hazelcast.operation.call.timeout.millis设置为5000,并将async-backup-count="3"设置为read-backup-data="true"。由于不相关的后台处理,我们在某些主机上还偶尔出现CPU使用率高峰(几分钟内高达100%),这可能会影响Hazelcast。

我们在日志中发现的唯一可疑的事情是,事件发生后,所有主机都在抱怨一个特定的主机,例如:

Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739863 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739864 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739852 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739870 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739874 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler

hostY的日志中:

Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor
WARNING: [hostY]:5702 [dev] [3.9] MonitorInvocationsTask delayed 14294 ms
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor
WARNING: [hostY]:5702 [dev] [3.9] BroadcastOperationControlTask delayed 13544 ms

有什么想法吗?

1 个答案:

答案 0 :(得分:2)

hostY的日志来看,似乎hostY遭受了GC暂停。 MonitorInvocationsTask计划每秒钟运行一次,但它的执行被延迟了14秒。并且BroadcastOperationControlTask应该由于您的配置(hazelcast.operation.call.timeout.millis / 4 = 1250 ms)而几乎每秒都被调度,但类似地,它会延迟13秒。

您可以通过启用GC日志来验证这一点。另外,当内存和/或CPU使用率超过某个阈值时,Hazelcast应该定期打印HealthMonitor日志。