在增加DC1节点上的写请求后,Cassandra DC2节点关闭

时间:2014-11-15 18:23:36

标签: cassandra cassandra-2.0 datastax

我们在多直流群集中使用Cassandra 2.1.2(DC1上有30台服务器,DC2上有10台服务器),DC1上的密钥空间复制因子为1,DC2上为2。

出于某种原因,当我们增加DC1上的写请求量(使用ONE或LOCAL_ONE)时,DC2节点上的cassandra java进程会随机下降。

当DC2节点开始下降时,DC1节点上的负载平均值约为3-5,而DC2上的负载平均值约为7-10 ..所以没什么大不了的。

看一下Cassandra的system.log,我们发现了一些例外:

ERROR [SharedPool-Worker-43] 2014-11-15 00:39:48,596 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
ERROR [CompactionExecutor:8] 2014-11-15 00:39:48,596 CassandraDaemon.java:153 - Exception in thread Thread[CompactionExecutor:8,1,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [Thrift-Selector_2] 2014-11-15 00:39:48,596 Message.java:238 - Got an IOException during write!
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_25]
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_25]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_25]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_25]
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) ~[na:1.8.0_25]
        at org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:164) ~[libthrift-0.9.1.jar:0.9.1]
        at com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104) ~[thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.util.mem.FastMemoryOutputTransport.streamTo(FastMemoryOutputTransport.java:112) ~[thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.Message.write(Message.java:222) ~[thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.handleWrite(TDisruptorServer.java:598) [thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.processKey(TDisruptorServer.java:569) [thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.select(TDisruptorServer.java:423) [thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.run(TDisruptorServer.java:383) [thrift-server-0.3.7.jar:na]
ERROR [Thread-94] 2014-11-15 00:39:48,597 CassandraDaemon.java:153 - Exception in thread Thread[Thread-94,5,main]
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) ~[na:1.8.0_25]
        at org.apache.cassandra.db.composites.AbstractCType.sliceBytes(AbstractCType.java:369) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:101) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:110) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82) ~[apache-cassandra-2.1.2.jar:2.1.2]

内存

  • DC1服务器具有32 GB的RAM,HEAP配置为8 GB。
  • DC2服务器具有16 GB的RAM,HEAP也配置为8 GB。

请,任何提示?

提前致谢。

1 个答案:

答案 0 :(得分:1)

当您指定LOCAL_ONE的一致性级别时,您告诉Cassandra在其中一个本地副本收到更新后立即考虑写入请求。但是,请求仍会发送到所有副本。另一个DC中的节点同时获取请求。由于网络延迟,请求的实际工作很可能在写请求表明它已成功完成后不久完成 - 我猜这是另一个DC死亡的“随机”时序的来源。实质上,该集群中的一个或多个节点正在过载。

TL; DR:写入的LOCAL_ONE基本上与ONE相同。 LOCAL_ONE仅对读取产生重大影响,其中仅查询本地DC(避免网络成本)。上面描述的集群在DC2中达到了吞吐量上限。