Cassandra以OutOfMemory(OOM)错误终止

时间:2013-06-21 17:06:02

标签: cassandra astyanax

我们在AWS上有一个3节点的cassandra集群。这些节点运行cassandra 1.2.2并具有8GB内存。我们没有更改任何默认堆或GC设置。因此每个节点分配1.8GB的堆空间。行很宽;每行存储大约260,000列。我们正在使用Astyanax阅读数据。如果我们的应用程序试图同时从10行或更多行读取80,000列,则某些节点会耗尽堆空间并以OOM错误终止。以下是错误消息:

java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
        at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
        at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
        at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
        at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
        at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
        at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
        at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
        at org.apache.cassandra.db.Table.getRow(Table.java:355)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
        at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space
        at java.lang.Long.toString(Long.java:269)
        at java.lang.Long.toString(Long.java:764)
        at org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171)
        at org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068)
        at org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192)
        at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766)
        at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722) ERROR 02:14:05,350 Exception in thread Thread[ACCEPT-/10.0.0.170,5,main] java.lang.RuntimeException: java.nio.channels.ClosedChannelException
        at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:893) Caused by: java.nio.channels.ClosedChannelException
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:211)
        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:99)
        at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:882)

每列中的数据少于50个字节。添加所有列开销(列名+元数据)后,它不应超过100个字节。因此,从10行读取80,000列每个意味着我们正在读取80,000 * 10 * 100 = 80 MB的数据。它很大,但不够大,无法填满1.8 GB的堆。所以我想知道为什么堆满了。如果数据请求很大以填充合理的时间,我希望Cassandra返回TimeOutException而不是终止。

一个简单的解决方案是增加堆大小,但这只会掩盖问题。读取80MB的数据不应该使1.8 GB堆满。

是否有其他一些Cassandra设置我可以调整以防止OOM异常?

1 个答案:

答案 0 :(得分:0)

  

不,我读取数据时没有正在进行的写入操作。我是   确保增加堆空间可能会有所帮助。但我想   理解为什么读取80MB数据会使1.8GB堆满。

Cassandra使用Heap和OfHeap chaching。 首次加载80MB用户数据可能会导致200-400 MB的Java堆使用率。 (哪个vm?64位?) 其次,这个内存被添加到已经用于缓存的内存中。它表明cassandra不会释放缓存以提供私人查询。可以提高吞吐量。

您是否同时通过增加MaxHeap来解决您的问题?