我正在使用JGroups 4.0.15.Final来实现集群应用程序。当我启动两个节点时,一切工作正常,它们已连接并开始工作。但是当我杀死/重启一个节点时,它们将不再连接在一起。
注意:它们是我的应用程序中的一项专业功能,因为我正在为TCP协议使用自定义套接字工厂。该工厂创建具有某些特殊要求的加密套接字。但是连接似乎很好并且套接字可以工作。我唯一注意到的是,在NODE_2被验证为无效之后,它们仍然通过TCP协议尝试在其套接字上进行连接。
我已经尝试寻找问题了好几个小时,但我只是找不到。有时,节点在进行几次连接尝试后又重新建立了连接,但这可能需要几分钟到一个多小时。我认为这与轮回问题有关,但我不确定。
我目前正在分析TCP协议的TRACE日志,但仍然无法指出两个节点需要几分钟才能再次合并的原因。
当两个( NODE_1 = 1.1.1.1 , NODE_2 = 1.1.1.2 )都已连接且一切正常时,我重新启动 NODE_2 精细。在双方,我都看到建立了连接:
NODE_1:
2019-01-17 09:51:55.089 [TQ-Bundl | ] TRACE TCP - 1.1.1.1:7800: failed connecting to 1.1.1.2:7800: java.net.SocketException: Failed to connect to '/1.1.1.2:7800' with Socket #1041583140. ConnectException: Connection refused (Connection refused)
2019-01-17 09:51:55.089 [TQ-Bundl | ] TRACE TCP - 1.1.1.1:7800: removed connection to 1.1.1.2:7800 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.089 [TQ-Bundl | ] TRACE TCP - JGRP000029: NODE_1 (flags=1): failed sending message to 1.1.1.2:7800 (117 bytes): java.net.SocketException: Failed to connect to '/1.1.1.2:7800' with Socket #1041583140. ConnectException: Connection refused (Connection refused), headers: TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
----------------------------
NODE_2 is comes back online
----------------------------
2019-01-17 09:51:55.791 [TcpServe | ] TRACE TCP - 1.1.1.1:7800: accepted connection from 1.1.1.2:7800 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.798 [jgrp-29, | ] TRACE TCP - NODE_1 (flags=1): received [dst: 1.1.1.1:7800, src: NODE_2 (2 headers), size=50 bytes, flags=OOB|DONT_BUNDLE|INTERNAL], headers are TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=true], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.798 [jgrp-29, | ] TRACE TCP - NODE_1 (flags=1): sending msg to 1.1.1.2:7800, src=NODE_1 (flags=1), headers are MERGE3: INFO: view_id=[NODE_1 (flags=1)|2], logical_name=NODE_1, physical_addr=1.1.1.1:7800, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.798 [jgrp-29, | ] TRACE TCPPING - NODE_1 (flags=1): received GET_MBRS_REQ from NODE_2, sending response NODE_1 (flags=1), name=NODE_1, addr=1.1.1.1:7800, coord | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.799 [jgrp-29, | ] TRACE TCP - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are TCPPING: [GET_MBRS_RSP cluster=null initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.852 [jgrp-29, | ] TRACE TCP - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NODE_2, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.853 [jgrp-29, | ] TRACE UNICAST3 - NODE_1 (flags=1) <-- DATA(NODE_2: #1, conn_id=0, first) | (Log4J2LogImpl.java:56)
2019-01-17 09:51:56.207 [jgrp-29, | ] TRACE UNICAST3 - NODE_1 (flags=1) --> ACK(NODE_2: #6) | (Log4J2LogImpl.java:56)
NODE_2:
2019-01-17 09:51:48.998 [cc-clust | ] DEBUG Configurator - set property TCP.bind_addr to default value /fe80:0:0:0:250:56ff:feb4:3785%eth2 | (Log4J2LogImpl.java:66)
2019-01-17 09:51:49.000 [cc-clust | ] DEBUG Configurator - set property TCP.diagnostics_addr to default value /ff0e:0:0:0:0:0:75:75 | (Log4J2LogImpl.java:66)
2019-01-17 09:51:49.004 [cc-clust | ] DEBUG TCP - thread pool min/max/keep-alive: 0/20/3000 use_fork_join=false, internal pool: 0/4/30000 (2 cores available) | (Log4J2LogImpl.java:71)
2019-01-17 09:51:49.011 [cc-clust | ] TRACE NAKACK2 - null: set max_xmit_req_size from 0 to 511600 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:49.012 [cc-clust | ] TRACE UNICAST3 - null: set max_xmit_req_size from 0 to 511600 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.535 [cc-clust | ] TRACE STABLE - NODE_2: stable task started | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.553 [cc-clust | ] TRACE TCP - joined /ff0e:0:0:0:0:0:75:75:7500 on eth2 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.554 [cc-clust | ] TRACE TCP - joined /ff0e:0:0:0:0:0:75:75:7500 on eth1 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.555 [cc-clust | ] TRACE TCP - joined /ff0e:0:0:0:0:0:75:75:7500 on eth0 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.555 [cc-clust | ] TRACE TCP - joined /ff0e:0:0:0:0:0:75:75:7500 on lo | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.581 [cc-clust | ] TRACE TCPPING - NODE_2: sending discovery request to 1.1.1.1:7800 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.582 [cc-clust | ] TRACE TCP - NODE_2: sending msg to 1.1.1.1:7800, src=NODE_2, headers are TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=true], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.591 [TQ-Bundl | ] TRACE TCP - 1.1.1.2:7800: connecting to 1.1.1.1:7800 | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.823 [jgrp-1,s | ] TRACE TCP - NODE_2: received [dst: NODE_2, src: 0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1) (2 headers), size=52 bytes, flags=OOB|DONT_BUNDLE|INTERNAL], headers are TCPPING: [GET_MBRS_RSP cluster=null initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.826 [jgrp-2,s | ] TRACE TCP - NODE_2: received [dst: 1.1.1.2:7800, src: 0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1) (2 headers), size=0 bytes, flags=INTERNAL], headers are MERGE3: INFO: view_id=[0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1)|2], logical_name=NODE_1, physical_addr=1.1.1.1:7800, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.830 [jgrp-1,s | ] TRACE TCPPING - NODE_2: received GET_MBRS_RSP from NODE_1 (flags=1): NODE_1 (flags=1), name=NODE_1, addr=1.1.1.1:7800, coord | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.832 [cc-clust | ] TRACE GMS - NODE_2: discovery took 262 ms, members: 1 rsps (1 coords) [done] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.835 [cc-clust | ] DEBUG GMS - NODE_2: sending JOIN(NODE_2) to NODE_1 (flags=1) | (Log4J2LogImpl.java:71)
2019-01-17 09:51:55.844 [cc-clust | ] TRACE UNICAST3 - NODE_2: created sender window for NODE_1 (flags=1) (conn-id=0) | (Log4J2LogImpl.java:56)
2019-01-17 09:51:55.848 [cc-clust | ] TRACE UNICAST3 - NODE_2 --> DATA(NODE_1 (flags=1): #1, conn_id=0, first) | (Log4J2LogImpl.java:46)
2019-01-17 09:51:55.848 [cc-clust | ] TRACE TCP - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NODE_2, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:56.210 [jgrp-1,s | ] TRACE TCP - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (2 headers), size=0 bytes, flags=INTERNAL], headers are UNICAST3: ACK, seqno=6, ts=6, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 09:51:56.211 [jgrp-1,s | ] TRACE UNICAST3 - NODE_2 <-- ACK(NODE_1 (flags=1): #6, conn-id=0, ts=6) | (Log4J2LogImpl.java:56)
在此之后,它们就像都处于循环中以处理某些事情。当NODE_2试图加入(将JOIN(NODE_2)发送到NODE_1 )时,NODE_1似乎没有注意到它,并且我看不到NODE_1的LOG中涉及GMS协议。我还看到很多消息一遍又一遍地询问第一个序列号(* UNICAST3 NODE_2-> SEND_FIRST_SEQNO(NODE_1(flags = 1))|(Log4J2LogImpl.java:56)*)。这是在走一段时间(时间从几秒变化有时小时),直到加入/合并突然作品:
NODE_1:
2019-01-17 10:11:36.043 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::21, UNICAST3: DATA, seqno=19750, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.544 [jgrp-125 | ] TRACE UNICAST3 - NODE_1 (flags=1) --> XMIT(NODE_2: #19750) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.544 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::21, UNICAST3: DATA, seqno=19750, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.547 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=OOB], headers are UNICAST3: SEND_FIRST_SEQNO, seqno=0, ts=581, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.547 [jgrp-125 | ] TRACE UNICAST3 - NODE_1 (flags=1) <-- SEND_FIRST_SEQNO(NODE_2) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.547 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are UNICAST3: DATA, seqno=19725, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.716 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=INTERNAL], headers are UNICAST3: ACK, seqno=19725, ts=582, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.716 [jgrp-125 | ] TRACE UNICAST3 - NODE_1 (flags=1) <-- ACK(NODE_2: #19725, conn-id=0, ts=582) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.296 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=INTERNAL], headers are MERGE3: VIEW_REQ: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.296 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are MERGE3: VIEW_RSP: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.300 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (3 headers), size=23 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=32, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.300 [jgrp-125 | ] TRACE UNICAST3 - NODE_1 (flags=1) <-- DATA(NODE_2: #32, conn_id=0) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.301 [jgrp-125 | ] TRACE UNICAST3 - NODE_1 (flags=1): delivering NODE_2#32 | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.301 [jgrp-125 | ] DEBUG STABLE - suspending message garbage collection | (Log4J2LogImpl.java:66)
2019-01-17 10:11:37.302 [jgrp-125 | ] DEBUG STABLE - NODE_1 (flags=1): resume task started, max_suspend_time=220000 | (Log4J2LogImpl.java:71)
2019-01-17 10:11:37.302 [jgrp-125 | ] TRACE GMS - NODE_1 (flags=1): got merge request from NODE_2, merge_id=NODE_2::22, mbrs=[NODE_1 (flags=1)] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.303 [jgrp-125 | ] TRACE UNICAST3 - NODE_1 (flags=1) --> DATA(NODE_2: #19751, conn_id=0) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.303 [jgrp-125 | ] TRACE TCP - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=19751, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.307 [jgrp-127 | ] TRACE TCP - NODE_1 (flags=1): received message batch of 1 messages from NODE_2 | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.310 [jgrp-127 | ] TRACE UNICAST3 - NODE_1 (flags=1) <-- DATA(NODE_2: #33 - #33) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.310 [jgrp-127 | ] TRACE UNICAST3 - NODE_1 (flags=1): delivering #33 - #33 (1 messages) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.313 [jgrp-127 | ] TRACE GMS - NODE_1 (flags=1): mcasting view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.314 [jgrp-127 | ] DEBUG NAKACK2 -
[NODE_1 (flags=1) mergeDigest()]
existing digest: NODE_1 (flags=1): [3 (3)], NODE_2: [0 (0)]
new digest: NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)]
resulting digest: NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:66)
2019-01-17 10:11:37.314 [jgrp-127 | ] DEBUG GMS - NODE_1 (flags=1): installing view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:71)
NODE_2:
2019-01-17 10:11:36.547 [jgrp-174 | ] TRACE UNICAST3 - NODE_2 --> SEND_FIRST_SEQNO(NODE_1 (flags=1)) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.547 [jgrp-174 | ] TRACE TCP - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are UNICAST3: SEND_FIRST_SEQNO, seqno=0, ts=581, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.550 [jgrp-174 | ] TRACE TCP - NODE_2: received message batch of 1 messages from NODE_1 (flags=1) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.550 [jgrp-174 | ] TRACE UNICAST3 - NODE_2: created receiver window for NODE_1 (flags=1) at seqno=#19725 for conn-id=0 | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.553 [jgrp-174 | ] TRACE UNICAST3 - NODE_2 <-- DATA(NODE_1 (flags=1): #19725 - #19725) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.553 [jgrp-174 | ] TRACE UNICAST3 - NODE_2: delivering #19725 - #19725 (1 messages) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:36.713 [jgrp-174 | ] TRACE UNICAST3 - NODE_2 --> ACK(NODE_1 (flags=1): #19725) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:36.713 [jgrp-174 | ] TRACE TCP - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are UNICAST3: ACK, seqno=19725, ts=582, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.296 [jgrp-174 | ] TRACE MERGE3 - NODE_2: merge participants are [NODE_2, NODE_1 (flags=1)] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.296 [jgrp-174 | ] TRACE TCP - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are MERGE3: VIEW_REQ: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.299 [jgrp-177 | ] TRACE TCP - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (2 headers), size=67 bytes, flags=INTERNAL], headers are MERGE3: VIEW_RSP: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.299 [jgrp-174 | ] DEBUG GMS - NODE_2: I will be the merge leader. Starting the merge task. Views: {NODE_1 (flags=1)=[NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], NODE_2=MergeView::[NODE_2|24] (1) [NODE_2], 1 subgroups: [NODE_2|23] (1) [NODE_2]} | (Log4J2LogImpl.java:71)
2019-01-17 10:11:37.300 [MergeTas | ] DEBUG STABLE - suspending message garbage collection | (Log4J2LogImpl.java:66)
2019-01-17 10:11:37.300 [MergeTas | ] DEBUG STABLE - NODE_2: resume task started, max_suspend_time=220000 | (Log4J2LogImpl.java:71)
2019-01-17 10:11:37.300 [MergeTas | ] DEBUG GMS - NODE_2: merge task NODE_2::22 started with 2 participants | (Log4J2LogImpl.java:71)
2019-01-17 10:11:37.300 [MergeTas | ] TRACE GMS - NODE_2: sending MERGE_REQ to [NODE_1 (flags=1), NODE_2] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.300 [MergeTas | ] TRACE UNICAST3 - NODE_2 --> DATA(NODE_1 (flags=1): #32, conn_id=0) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.300 [MergeTas | ] TRACE TCP - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=32, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.301 [MergeTas | ] TRACE UNICAST3 - NODE_2 --> DATA(NODE_2: #64, conn_id=1) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.301 [MergeTas | ] TRACE TCP - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.301 [MergeTas | ] TRACE TCP - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=22 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.301 [jgrp-174 | ] TRACE TCP - NODE_2: received [dst: NODE_2, src: NODE_2 (3 headers), size=22 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.302 [jgrp-174 | ] TRACE UNICAST3 - NODE_2 <-- DATA(NODE_2: #64, conn_id=1) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.302 [jgrp-174 | ] TRACE UNICAST3 - NODE_2: delivering NODE_2#64 | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.302 [jgrp-174 | ] TRACE GMS - NODE_2: got merge request from NODE_2, merge_id=NODE_2::22, mbrs=[NODE_2] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.303 [jgrp-174 | ] TRACE UNICAST3 - NODE_2 --> DATA(NODE_2: #65, conn_id=1) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.303 [jgrp-174 | ] TRACE TCP - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.303 [jgrp-174 | ] TRACE TCP - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=51 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.304 [jgrp-179 | ] TRACE UNICAST3 - NODE_2: delivering #65 - #65 (1 messages) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.304 [jgrp-179 | ] TRACE GMS - NODE_2: got merge response from NODE_2, merge_id=NODE_2::22, merge data is sender=NODE_2, view=[NODE_2|24] (1) [NODE_2], digest=NODE_2: [21 (21)] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.304 [jgrp-177 | ] TRACE TCP - NODE_2: received [dst: NODE_2, src: NODE_2 (3 headers), size=51 bytes, flags=OOB|INTERNAL, transient_flags=OOB_DELIVERED], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.304 [jgrp-177 | ] TRACE UNICAST3 - NODE_2 <-- DATA(NODE_2: #65, conn_id=1) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.305 [jgrp-177 | ] TRACE TCP - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (3 headers), size=53 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=19751, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.305 [jgrp-177 | ] TRACE UNICAST3 - NODE_2 <-- DATA(NODE_1 (flags=1): #19751, conn_id=0) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.305 [jgrp-177 | ] TRACE UNICAST3 - NODE_2: delivering NODE_1 (flags=1)#19751 | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.306 [jgrp-177 | ] TRACE GMS - NODE_2: got merge response from NODE_1 (flags=1), merge_id=NODE_2::22, merge data is sender=NODE_1 (flags=1), view=[NODE_1 (flags=1)|3] (1) [NODE_1 (flags=1)], digest=NODE_1 (flags=1): [3 (3)] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.306 [MergeTas | ] TRACE GMS - NODE_2: collected 2 merge response(s) in 6 ms | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.307 [MergeTas | ] TRACE GMS - NODE_2: consolidated view=MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], MergeView::[NODE_2|24] (1) [NODE_2], 1 subgroups: [NODE_2|23] (1) [NODE_2]
consolidated digest=NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.307 [MergeTas | ] DEBUG GMS - NODE_2: installing merge view [NODE_1 (flags=1)|25] (2 members) in 2 coords | (Log4J2LogImpl.java:71)
2019-01-17 10:11:37.307 [MergeTas | ] TRACE UNICAST3 - NODE_2 --> DATA(NODE_1 (flags=1): #33, conn_id=0) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.307 [MergeTas | ] TRACE TCP - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=33, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.307 [MergeTas | ] TRACE UNICAST3 - NODE_2 --> DATA(NODE_2: #66, conn_id=1) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.307 [MergeTas | ] TRACE TCP - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=66, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.308 [MergeTas | ] TRACE TCP - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=108 bytes], headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=66, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.308 [MergeTas | ] DEBUG GMS - NODE_2: merge NODE_2::22 took 9 ms | (Log4J2LogImpl.java:71)
2019-01-17 10:11:37.309 [jgrp-179 | ] TRACE TCP - NODE_2: received message batch of 1 messages from NODE_2 | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.309 [jgrp-179 | ] TRACE UNICAST3 - NODE_2 <-- DATA(NODE_2: #66 - #66) | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.309 [jgrp-179 | ] TRACE UNICAST3 - NODE_2: delivering #66 - #66 (1 messages) | (Log4J2LogImpl.java:46)
2019-01-17 10:11:37.310 [jgrp-179 | ] TRACE GMS - NODE_2: mcasting view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:56)
2019-01-17 10:11:37.310 [jgrp-179 | ] DEBUG NAKACK2 -
[NODE_2 mergeDigest()]
existing digest: NODE_2: [21 (21)]
new digest: NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)]
resulting digest: NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:66)
2019-01-17 10:11:37.310 [jgrp-179 | ] DEBUG GMS - NODE_2: installing view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:71)
此后,一切再次正常运行,直到重新启动一个节点为止。我真的很笨,需要一些帮助。有谁知道为什么NODE_1不早开始加入/合并?感谢您的帮助。
这是我当前的配置:
public class JGroupsConfiguration {
final Protocol[] PROTOCOL_STACK = {
new TCP()
.setValue("bind_port", EngineConfiguration.get().clusterSrcPort().get())
.setValue("max_bundle_size", 64000)
.setValue("sock_conn_timeout", 300)
.setValue("thread_pool_min_threads", 0)
.setValue("thread_pool_max_threads", 20)
.setValue("thread_pool_keep_alive_time", 3000),
new TCPPING()
.setValue("async_discovery", false)
.setValue("port_range", 0)
.setValue("send_cache_on_join", true)
.setValue("return_entire_cache", true),
new MERGE3()
.setValue("min_interval", 10000)
.setValue("max_interval", 30000),
new FD()
.setValue("timeout", 3000)
.setValue("max_tries", 3),
new VERIFY_SUSPECT()
.setValue("timeout", 1500),
new BARRIER(),
new NAKACK2()
.setValue("use_mcast_xmit", false)
.setValue("discard_delivered_msgs", true),
new UNICAST3()
// Disables retransmitting of queued messages to reconnected nodes.
.setValue("max_retransmit_time", 0),
new STABLE()
.setValue("desired_avg_gossip", 50000)
.setValue("max_bytes", 4000000),
new GMS()
.setValue("print_local_addr", true)
.setValue("join_timeout", 2000),
new FRAG3()
.setValue("frag_size", 60000),
new STATE()
};
}
答案 0 :(得分:0)
事实证明,问题出在UNICAST3协议上。该协议尝试将每条消息重新发送到重新连接的群集成员,然后再实际允许它们作为成员合并到群集中。这就是节点之间所有SEQNO消息的原因。从堆栈中删除协议后,重新连接成员的合并就可以了。 注意:,我们的应用程序不需要UNICAST3的功能,因为我们不依赖可靠的消息传递!