首先,我为冗长的帖子道歉。我正在使用Hazelcast 3.4.4并测试由2个webapp节点组成的集群...一个在我的本地windows框(10.10.222.239)上运行,另一个在linux vm(10.10.222.145)的另一个框上运行。 hazelcast配置设置如下:
<hz:hazelcast id="instance">
<hz:config>
<hz:spring-aware />
<hz:group name="cluster1" password="cluster1" />
<hz:network port="5701" port-auto-increment="true">
<hz:join>
<hz:multicast enabled="false"/>
<hz:tcp-ip enabled="true">
<hz:members>10.10.222.239, 10.10.222.145</hz:members>
</hz:tcp-ip>
</hz:join>
</hz:network>
....
</hz:config>
</hz:hazelcast>
当我启动我的windows webapp节点(linux节点关闭)时,日志会输出我期望看到的内容。也就是说,只有一个节点,有些尝试连接到linux节点:
Members [1] {
Member [10.10.222.239]:5701 this
}
...
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5702. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5702]
Jul 15, 2015 4:53:22 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5701. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5701]
Jul 15, 2015 4:53:22 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5703. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5703]
当我启动linux节点时,一切似乎都运行正常。如windows节点日志所示,两者相互连接:
NFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Accepting socket connection from /10.10.222.145:52127
Jul 15, 2015 4:55:09 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Established socket connection between /10.10.222.239:5701 and /10.10.222.145:52127
Jul 15, 2015 4:55:16 PM com.hazelcast.cluster.ClusterService
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4]
Members [2] {
Member [10.10.222.239]:5701 this
Member [10.10.222.145]:5701
}
启动和停止linux节点会产生预期的行为;它加入和退出集群。
首先启动linux节点,然后启动windows节点,我的问题就出现了;事情的行为方式不同。在Windows节点启动后,两个人似乎没有看到对方。日志表明两者都是独立的。
// linux
Members [1] {
Member [10.10.222.145]:5701 this
}
// windows
Members [1] {
Member [10.10.222.239]:5701 this
}
在linux节点日志中有尝试连接到另一个但是它失败了,我猜这是因为Windows最初是关闭的。但令人意外的是,windows one显示了对linux节点的连接尝试失败:
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5701. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5701]
Jul 15, 2015 5:08:31 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5702. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5702]
Jul 15, 2015 5:08:31 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5703. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5703]
4分钟后,Windows节点显示:
Jul 15, 2015 5:12:37 PM com.hazelcast.nio.tcp.SocketAcceptor
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Accepting socket connection from /10.10.222.145:52800
Jul 15, 2015 5:12:37 PM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Established socket connection between /10.10.222.239:5701 and /10.10.222.145:52800
Jul 15, 2015 5:13:13 PM com.hazelcast.cluster.impl.TcpIpJoiner
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.239]:5701 is merging to Address[10.10.222.145]:5701, because : node.getThisAddress().hashCode() > joinRequest.address.hashCode() , this node member count: 1
Jul 15, 2015 5:13:13 PM com.hazelcast.cluster.impl.TcpIpJoiner
WARNING: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.239]:5701 is merging [tcp/ip] to Address[10.10.222.145]:5701
Jul 15, 2015 5:13:13 PM com.hazelcast.cluster.impl.operations.PrepareMergeOperation
WARNING: [10.10.222.239]:5701 [cluster1] [3.4.4] Preparing to merge... Waiting for merge instruction...
Jul 15, 2015 5:13:13 PM com.hazelcast.cluster.impl.operations.MergeClustersOperation
WARNING: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.239]:5701 is merging to Address[10.10.222.145]:5701, because: instructed by master Address[10.10.222.239]:5701
Jul 15, 2015 5:13:13 PM com.hazelcast.core.LifecycleService
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.239]:5701 is MERGING
Jul 15, 2015 5:13:13 PM com.hazelcast.nio.tcp.TcpIpConnection
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Connection [Address[10.10.222.145]:5701] lost. Reason: Socket explicitly closed
Jul 15, 2015 5:13:14 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Connecting to /10.10.222.145:5701, timeout: 0, bind-any: true
Jul 15, 2015 5:13:31 PM com.hazelcast.cluster.impl.TcpIpJoiner
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.145]:5702 is added to the blacklist.
Jul 15, 2015 5:13:34 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Connecting to /10.10.222.145:5702, timeout: 0, bind-any: true
Jul 15, 2015 5:13:34 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Connecting to /10.10.222.239:5702, timeout: 0, bind-any: true
Jul 15, 2015 5:13:34 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Connecting to /10.10.222.239:5703, timeout: 0, bind-any: true
Jul 15, 2015 5:13:34 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Connecting to /10.10.222.145:5703, timeout: 0, bind-any: true
Jul 15, 2015 5:13:35 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5701. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5701]
Jul 15, 2015 5:13:35 PM com.hazelcast.cluster.impl.TcpIpJoiner
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.145]:5701 is added to the blacklist.
Jul 15, 2015 5:13:35 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.239:5703. Reason: SocketException[Connection refused: connect to address /10.10.222.239:5703]
Jul 15, 2015 5:13:35 PM com.hazelcast.cluster.impl.TcpIpJoiner
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.239]:5703 is added to the blacklist.
Jul 15, 2015 5:13:35 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.239:5702. Reason: SocketException[Connection refused: connect to address /10.10.222.239:5702]
Jul 15, 2015 5:13:35 PM com.hazelcast.cluster.impl.TcpIpJoiner
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.239]:5702 is added to the blacklist.
Jul 15, 2015 5:13:39 PM com.hazelcast.cluster.impl.TcpIpJoiner
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4]
Members [1] {
Member [10.10.222.239]:5701 this
}
Jul 15, 2015 5:13:39 PM com.hazelcast.core.LifecycleService
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Address[10.10.222.239]:5701 is MERGED
Jul 15, 2015 5:13:55 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5702. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5702]
Jul 15, 2015 5:13:55 PM com.hazelcast.nio.tcp.SocketConnector
INFO: [10.10.222.239]:5701 [cluster1] [3.4.4] Could not connect to: /10.10.222.145:5703. Reason: SocketException[Connection timed out: connect to address /10.10.222.145:5703]
似乎尝试合并群集节点,但失败了。我不太明白为什么事情的行为会有所不同,具体取决于谁先启动。是否存在错误配置或某些暗示我不知道的行为?
答案 0 :(得分:0)
原来是Linux机箱上的防火墙问题。我关闭iptables服务进行测试,然后事情按预期工作。