由于流错误,无法添加新的Cassandra数据中心

时间:2016-04-19 13:15:34

标签: cassandra datastax datastax-enterprise datastax-startup

使用DSE 4.8.6(C * 2.1.13.1218)

当我尝试在新数据中心添加新节点时,引导/节点重建总是被流错误中断。

来自system.log的错误示例:

ERROR [STREAM-IN-/172.31.47.213] 2016-04-19 12:30:28,531  StreamSession.java:621 - [Stream #743d44e0-060e-11e6-985c-c1820b05e9ae] Remote peer 172.31.47.213 failed stream session.
INFO  [STREAM-IN-/172.31.47.213] 2016-04-19 12:30:30,665  StreamResultFuture.java:180 - [Stream #743d44e0-060e-11e6-985c-c1820b05e9ae] Session with /172.31.47.213 is complete

大约有500GB的数据要流式传输到新节点。 Boostrap或重建操作从另一个(主)DC上的4个不同节点流式传输。

发生流错误时,所有同步的数据都会被删除(我必须重新开始)。

到目前为止我尝试了什么:

  • 引导节点
  • auto_boostrap: False中设置cassandra.yaml并手动运行nodetool rebuild
  • 禁用streaming_socket_timeout_in_ms并在我的linux conf中设置更积极的TCP Keep Alive值(遵循CASSANDRA-9440票证中的建议)
  • 增加phi_convict_threshold(最大)
  • 不要引导节点并使用修复来流式传输数据(在几乎完整的磁盘和80K SSTables上停止修复。在尝试压缩它们3天后,我放弃了)

我应该尝试其他任何事情吗?我正在每个故障节点上运行nodetool scrub,看看这是否有帮助......

在流出节点上,这些是错误消息:

ERROR [STREAM-IN-/172.31.45.28] 2016-05-11 13:10:43,842  StreamSession.java:505 - [Stream #ecfe0390-1763-11e6-b6c8-c1820b05e9ae] Streaming error occurred
java.net.SocketTimeoutException: null
        at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) ~[na:1.7.0_80]
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.7.0_80]
        at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) ~[na:1.7.0_80]
        at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51) ~[cassandra-all-2.1.14.1272.jar:2.1.14.1272]
        at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257) ~[cassandra-all-2.1.14.1272.jar:2.1.14.1272]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]

然后:

INFO  [STREAM-IN-/172.31.45.28] 2016-05-10 07:59:14,023  StreamResultFuture.java:180 - [Stream #ea1271b0-1679-11e6-917a-c1820b05e9ae] Session with /172.31.45.28 is complete
WARN  [STREAM-IN-/172.31.45.28] 2016-05-10 07:59:14,023  StreamResultFuture.java:207 - [Stream #ea1271b0-1679-11e6-917a-c1820b05e9ae] Stream failed
ERROR [STREAM-OUT-/172.31.45.28] 2016-05-10 07:59:14,024  StreamSession.java:505 - [Stream #ea1271b0-1679-11e6-917a-c1820b05e9ae] Streaming error occurred
java.lang.AssertionError: Memory was freed
        at org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:338) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]

2 个答案:

答案 0 :(得分:2)

正如Cassandra门票CASSANDRA-11345中所回答的那样,这个问题是由于传输了一个大的SSTable文件(40GB)。

传输所述文件的时间超过1小时,默认情况下,如果传出传输时间超过1小时,则流操作会超时。

要更改此默认行为,您可以将cassandra.yaml配置文件中的streaming_socket_timeout_in_ms设置为较大的值(例如:72000000 ms或20小时)

答案 1 :(得分:0)

不要忘记在现有节点上更改此值,而不仅仅是新节点! (不是我在这里承认......)