我们有三个必须在群集中的服务。因此,我们使用Infinispan来集群节点并在这些服务之间共享数据。成功重新启动后,有时我会收到异常,并在我的其他节点中收到了“已更改”事件。实际上所有节点都在运行。我无法弄清楚原因。
我正在使用Infinispan 8.1.3分布式缓存,jgroups-3.4
org.infinispan.util.concurrent.TimeoutException: Replication timeout for sipproxy-16964
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:765)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$80(JGroupsTransport.java:599)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$$Lambda$9/1547262581.apply(Unknown Source)
at java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:717)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2345)
at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:46)
at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:17)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-08-22 04:44:52,902 INFO [JGroupsTransport] (ViewHandler,ISPN,transport_manager-48870) ISPN000094: Received new cluster view for channel ISPN: [transport_manager-48870|3] (2) [transport_manager-48870, mediaproxy-47178]
2017-08-22 04:44:52,949 WARN [PreferAvailabilityStrategy] (transport-thread-transport_manager-p4-t24) ISPN000313: Cache mediaProxyResponseCache lost data because of abrupt leavers [sipproxy-16964]
2017-08-22 04:44:52,951 WARN [ClusterTopologyManagerImpl] (transport-thread-transport_manager-p4-t24) ISPN000197: Error updating cluster member list
java.lang.IllegalArgumentException: There must be at least one node with a non-zero capacity factor
at org.infinispan.distribution.ch.impl.DefaultConsistentHashFactory.checkCapacityFactors(DefaultConsistentHashFactory.java:57)
at org.infinispan.distribution.ch.impl.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:74)
at org.infinispan.distribution.ch.impl.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:26)
at org.infinispan.topology.ClusterCacheStatus.updateCurrentTopology(ClusterCacheStatus.java:431)
at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onClusterViewChange(PreferAvailabilityStrategy.java:56)
at org.infinispan.topology.ClusterCacheStatus.doHandleClusterView(ClusterCacheStatus.java:337)
at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:397)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:314)
at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:571)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
jgroups.xml:
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.4.xsd">
<TCP bind_addr="131.10.20.16"
bind_port="8010" port_range="10"
recv_buf_size="20000000"
send_buf_size="640000"
loopback="false"
max_bundle_size="64k"
bundler_type="old"
enable_diagnostics="true"
thread_naming_pattern="cl"
timer_type="new"
timer.min_threads="4"
timer.max_threads="30"
timer.keep_alive_time="3000"
timer.queue_max_size="100"
timer.wheel_size="200"
timer.tick_time="50"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="30"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="2"
oob_thread_pool.max_threads="30"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"/>
<TCPPING initial_hosts="131.10.20.16[8010],131.10.20.17[8010],131.10.20.182[8010]" port_range="2"
timeout="3000" num_initial_members="3" />
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_ALL interval="3000" timeout="10000" />
<VERIFY_SUSPECT timeout="500" />
<BARRIER />
<pbcast.NAKACK use_mcast_xmit="false"
retransmit_timeout="100,300,600,1200"
discard_delivered_msgs="true" />
<UNICAST3 conn_expiry_timeout="0"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="10m"/>
<pbcast.GMS print_local_addr="true" join_timeout="5000"
max_bundling_time="30"
view_bundling="true"/>
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60000" />
<pbcast.STATE_TRANSFER/>
</config>
答案 0 :(得分:3)
TimeoutException仅表示在超时内没有收到RPC的响应,仅此而已。当服务器处于压力之下时可能会发生这种情况,但这可能不是这种情况 - 以下日志表明该节点被“怀疑” - 该节点可能没有响应超过10秒(这是您的配置中的限制,见FD_ALL)。
首先检查该服务器中的日志是否存在错误,并检查GC日志是否存在任何世界停留。
答案 1 :(得分:2)
正如@flavius建议的那样,主要原因是您的某个节点由于某种原因而停止,并且无法回复RPC。
我建议更改JGroups的日志记录级别,以便您可以看到为什么某个节点被怀疑(可能由FD_SOCK
或FD_ALL
协议发生)以及为什么它从视图中被删除(这很可能是因为VERIFY_SUSPECT
协议而发生的。)
您还可以检查发生的原因。在大多数情况下,它是由长时间的GC暂停引起的。但是出于其他原因,主机也可以暂停您的VM。我建议在两个VM中使用JHiccup并将其作为Java代理附加到您的进程中。这样你应该注意它的JVM Stop The World是否导致了这个或者是操作系统。