我们在多直流群集中使用Cassandra 2.1.2(DC1上有30台服务器,DC2上有10台服务器),DC1上的密钥空间复制因子为1,DC2上为2。
出于某种原因,当我们增加DC1上的写请求量(使用ONE或LOCAL_ONE)时,DC2节点上的cassandra java进程会随机下降。
当DC2节点开始下降时,DC1节点上的负载平均值约为3-5,而DC2上的负载平均值约为7-10 ..所以没什么大不了的。
看一下Cassandra的system.log,我们发现了一些例外:
ERROR [SharedPool-Worker-43] 2014-11-15 00:39:48,596 JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
ERROR [CompactionExecutor:8] 2014-11-15 00:39:48,596 CassandraDaemon.java:153 - Exception in thread Thread[CompactionExecutor:8,1,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [Thrift-Selector_2] 2014-11-15 00:39:48,596 Message.java:238 - Got an IOException during write!
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_25]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_25]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_25]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_25]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) ~[na:1.8.0_25]
at org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:164) ~[libthrift-0.9.1.jar:0.9.1]
at com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104) ~[thrift-server-0.3.7.jar:na]
at com.thinkaurelius.thrift.util.mem.FastMemoryOutputTransport.streamTo(FastMemoryOutputTransport.java:112) ~[thrift-server-0.3.7.jar:na]
at com.thinkaurelius.thrift.Message.write(Message.java:222) ~[thrift-server-0.3.7.jar:na]
at com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.handleWrite(TDisruptorServer.java:598) [thrift-server-0.3.7.jar:na]
at com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.processKey(TDisruptorServer.java:569) [thrift-server-0.3.7.jar:na]
at com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.select(TDisruptorServer.java:423) [thrift-server-0.3.7.jar:na]
at com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.run(TDisruptorServer.java:383) [thrift-server-0.3.7.jar:na]
ERROR [Thread-94] 2014-11-15 00:39:48,597 CassandraDaemon.java:153 - Exception in thread Thread[Thread-94,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) ~[na:1.8.0_25]
at org.apache.cassandra.db.composites.AbstractCType.sliceBytes(AbstractCType.java:369) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:101) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:110) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82) ~[apache-cassandra-2.1.2.jar:2.1.2]
内存
请,任何提示?
提前致谢。
答案 0 :(得分:1)
当您指定LOCAL_ONE的一致性级别时,您告诉Cassandra在其中一个本地副本收到更新后立即考虑写入请求。但是,请求仍会发送到所有副本。另一个DC中的节点同时获取请求。由于网络延迟,请求的实际工作很可能在写请求表明它已成功完成后不久完成 - 我猜这是另一个DC死亡的“随机”时序的来源。实质上,该集群中的一个或多个节点正在过载。
TL; DR:写入的LOCAL_ONE基本上与ONE相同。 LOCAL_ONE仅对读取产生重大影响,其中仅查询本地DC(避免网络成本)。上面描述的集群在DC2中达到了吞吐量上限。