我们有Cassandra的6个节点的集群,3个种子。有一天,AWS向我们发送了一条消息,告知我们的一个实例将退役,这就是seed01。要解决此问题,我们应该停止/启动实例以将其移动到新的AWS主机。在停止/开始之前我们做了:
2)停止八卦
3)停止节俭
4)排水
5)停止卡桑德拉
6)将所有数据移动到ebs(我们使用临时卷数据)
7)停止/启动实例
8)将数据移回
9)启动Cassandra
但是在seed01上启动cassandra后,nodetool状态显示:
Datacenter: UNKNOWN-DC
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.149.45.115 ? 256 17.3% ae4166fb-76e1-4900-947c-7e87ca262ea0 UNKNOWN-RACK
DN 10.164.84.171 ? 256 17.5% 638dae19-a6f5-4330-9466-f46ddb3b9d79 UNKNOWN-RACK
DN 10.149.44.215 ? 256 16.2% 987914af-f057-4922-8ee1-2a999108c75d UNKNOWN-RACK
DN 10.232.20.72 ? 256 14.8% fb5dfd50-de9e-42ed-b539-bd937a045992 UNKNOWN-RACK
DN 10.166.37.188 ? 256 17.1% f149c294-ca1d-427c-b510-2f91a0966b5a UNKNOWN-RACK
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.232.17.19 1020.87 MB 256 17.1% 08055af6-5dfa-4d4e-aa72-cf1d2952e23e 1b
我们还尝试使用seed02和seed03作为种子在配置中启动seed04,但它创建了新的环而不是加入现有的。
我们检查了所有节点上的端口7000,并且所有节点都可以访问此端口。默认情况下,我们为所有节点所在的相同安全组打开所有端口(TCP / UDP 0-65535)。 在tcpdump中,我看到它尝试连接到种子:
08:43:42.056115 IP 10.235.62.198.45163 > 10.164.84.171.7000: Flags [P.], seq 0:8, ack 1, win 46, options [nop,nop,TS val 81748069 ecr 538805526], length 8
08:43:42.056146 IP 10.164.84.171.7000 > 10.235.62.198.45163: Flags [R], seq 110766787, win 0, length 0
08:43:42.157893 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [S], seq 452519826, win 5840, options [mss 1460,sackOK,TS val 81748094 ecr 0,nop,wscale 7], length 0
08:43:42.157903 IP 10.164.84.171.7000 > 10.235.62.198.45165: Flags [S.], seq 4035182025, ack 452519827, win 5792, options [mss 1460,sackOK,TS val 538833931 ecr 81748094,nop,wscale 7], length 0
08:43:42.158920 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [.], ack 1, win 46, options [nop,nop,TS val 81748094 ecr 538833931], length 0
08:43:42.159053 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748094 ecr 538833931], length 8
08:43:42.360086 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748145 ecr 538833931], length 8
08:43:42.768080 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748247 ecr 538833931], length 8
08:43:43.584072 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748451 ecr 538833931], length 8
08:43:45.216087 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748859 ecr 538833931], length 8
08:43:45.783333 IP 10.164.84.171.7000 > 10.235.62.198.45165: Flags [S.], seq 4035182025, ack 452519827, win 5792, options [mss 1460,sackOK,TS val 538834838 ecr 81748859,nop,wscale 7], length 0
08:43:45.784337 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [.], ack 1, win 46, options [nop,nop,TS val 81749001 ecr 538834838,nop,nop,sack 1 {0:1}], length 0
其中10.235.62.198新节点和10.164.84.171是种子
我们使用带有vnodes的cassandra版本1.2.6。
请帮忙。我们花了将近3天的时间试图修复它而没有运气。