JBoss Ehcache复制异常(在xmit_table中找不到发件人)

时间:2013-11-04 06:47:57

标签: jboss load-balancing ehcache jgroups

所以,我只是在apache后面设置了两个jboss节点,启用了集群并设置了ehcache同步。现在两个节点都在运行,我在没有收到请求的节点上收到以下异常:

...
JBoss_5_1_0_GA date=200905221634)] Started in 2m:16s:391ms
12:52:51,139 ERROR [NAKACK] sender 10.166.17.53:7600 not found in xmit_table
12:52:51,139 ERROR [NAKACK] range is null
12:52:51,145 INFO  [RPCManagerImpl] Received new cluster view: MergeView::[10.16                 6.17.52:7600|1] [10.166.17.52:7600, 10.166.17.53:7600], subgroups=[[10.166.17.52                       :7600|0] [10.166.17.52:7600], [10.166.17.53:7600|0] [10.166.17.53:7600]]
12:53:10,006 WARN  [NAKACK] 10.166.17.52:7600] discarded message from non-member                        10.166.17.53:7600, my view is [10.166.17.52:7600|0] [10.166.17.52:7600]
12:53:10,108 WARN  [NAKACK] 10.166.17.52:7600] discarded message from non-member                        10.166.17.53:7600, my view is [10.166.17.52:7600|0] [10.166.17.52:7600]
12:53:10,110 ERROR [NAKACK] sender 10.166.17.53:7600 not found in xmit_table
12:53:10,110 ERROR [NAKACK] range is null
12:53:10,113 INFO  [graCluster] New cluster view for partition graCluster (id: 1                       , delta: 1) : [127.0.0.1:1099, 127.0.0.1:1099]
12:53:10,117 INFO  [graCluster] Merging partitions...
12:53:10,118 INFO  [graCluster] Dead members: 0
12:53:10,120 INFO  [graCluster] Originating groups: [[10.166.17.52:7600|0] [10.1                       66.17.52:7600], [10.166.17.53:7600|0] [10.166.17.53:7600]]

以下是我的ehcache.xml的样子:

<cacheManagerPeerProviderFactory
       class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
       properties="connect=TCP(start_port=7800):TCPPING(initial_hosts=10.46.49.52[7800],10.46.49.53[7800];port_range=10;timeout=3000;
                    num_initial_members=2;up_thread=true;down_thread=true):
                    VERIFY_SUSPECT(timeout=1500;down_thread=false;up_thread=false):
                    pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
                    pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;
                    print_local_addr=false;down_thread=true;up_thread=true)"
                    propertySeparator="::"/>

最后,这是我运行两个节点的方式:

  

./ run.sh -c all -g myCluster -Djboss.default.jgroups.stack = tcp   -Djgroups.tcpping.initial_hosts = 10.166.17.52 [7600],10.166.17.53 [7600] -Djboss.messaging.ServicePeerId = 1 -Djgroups.bind_addr = 10.166.17.52 -Djboss.node.name = node1 -b 0.0.0.0 < / p>

  

./ run.sh -c all -g myCluster-Djboss.default.jgroups.stack = tcp   -Djgroups.tcpping.initial_hosts = 10.166.17.52 [7600],10.166.17.53 [7600] -Djboss.messaging.ServicePeerId = 2 -Djgroups.bind_addr = 10.166.17.53 -Djboss.node.name = node2 -b 0.0.0.0 < / p>

服务器正试图互相交谈。我不确定他们是否在同一个集群中。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

我打开了ehcache日志记录,并发现虽然节点试图互相通话,但它们会失败,无法建立彼此的连接。通过修复配置错误的主机文件解决了这个问题。一旦节点开始相互通信,ehcache复制就可以了。显然,关于xmit_table的错误是无关紧要的。

答案 1 :(得分:0)

最近在基于TCP的发现和跨Windows机器的EHCache复制进行POC时遇到了这个问题。当使用IP地址作为绑定地址-Djgroups.bind_addr =时,在本地运行2个服务实例工作正常。但是在跨机器连接时失败了。我们无权更改主机文件,因此更改了绑定地址以使用机器名而不是IP。重新启动服务并跨机器进行通信,并且可以按预期复制缓存中的所有CRUD操作。