我们正在尝试使用Cassandra作为我们的数据存储区,并且由于堆空间不足而导致节点出现故障。我们在运行Ubuntu服务器13.04的9节点集群上运行带有Cassandra 2.0.1的Datastax Community Edition,每个节点有16 GB RAM。在数据迁移期间,由于堆空间不足,我们的两个节点意外中断。日志中的堆栈跟踪相当不明显且多种多样。以下是其中一个示例:
ERROR [MutationStage:21] 2013-11-01 07:08:39,656 CassandraDaemon.java (line 185) Exception in thread Thread[MutationStage:21,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.apache.cassandra.utils.SlabAllocator$Region.init(SlabAllocator.java:178)
at org.apache.cassandra.utils.SlabAllocator.getRegion(SlabAllocator.java:101)
at org.apache.cassandra.utils.SlabAllocator.allocate(SlabAllocator.java:70)
at org.apache.cassandra.utils.Allocator.clone(Allocator.java:30)
at org.apache.cassandra.db.ColumnFamilyStore.internOrCopy(ColumnFamilyStore.java:2220)
at org.apache.cassandra.db.Column.localCopy(Column.java:277)
at org.apache.cassandra.db.Memtable$1.apply(Memtable.java:107)
at org.apache.cassandra.db.Memtable$1.apply(Memtable.java:104)
at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
在此之前有像这样的AssertionErrors:
ERROR [FlushWriter:6176] 2013-11-01 06:55:48,825 CassandraDaemon.java (line 185) Exception in thread Thread[FlushWriter:6176,5,main]
java.lang.AssertionError
at org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198)
at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:186)
at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:358)
at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:317)
at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
以及一系列垃圾收集状态消息,如:
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,923 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 5935 ms for 1 collections, 2963961136 used; max is 3902799872
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,924 StatusLogger.java (line 55) Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) ReadStage 0 3 58646672 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) RequestResponseStage 0 1 22614351 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) ReadRepairStage 0 0 76371 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) MutationStage 7 260 709366463 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) ReplicateOnWriteStage 0 0 104455 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) GossipStage 0 1 3695467 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,953 StatusLogger.java (line 70) AntiEntropyStage 0 0 404 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,954 StatusLogger.java (line 70) MigrationStage 0 0 1178 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,954 StatusLogger.java (line 70) MemtablePostFlusher 1 39 43229 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) MemoryMeter 0 0 668 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) FlushWriter 0 0 23228 0 82
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) MiscStage 0 0 196 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) commitlog_archiver 0 0 0 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) InternalResponseStage 0 0 276 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) HintedHandoff 0 0 13 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 79) CompactionManager 3 11
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 81) Commitlog n/a 261
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 93) MessagingService n/a 1,0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 103) Cache Type Size Capacity KeysToSave
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 105) KeyCache 41783700 104857600 all
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,975 StatusLogger.java (line 111) RowCache 0 0 all
...
考虑到仅在4小时的数据摄取后发生这种情况,我们想知道为什么会发生这种情况以及我们可以做些什么来阻止它再次发生。提前谢谢。
答案 0 :(得分:-1)
已经在Ubuntu上运行Cassandra多年,并且它对RAM设置非常敏感。通常,每个节点不要存储超过1TB的数据,并且避免以小于8GB的最大堆运行。
请参阅/etc/cassandra/cassandra-env.sh中的“MAX_HEAP_SIZE”设置。
当您最初导入数据时,它会进入RAM,然后被压缩。通常最好在初始启动时将最大堆设置为较高,然后在群集完全启动时使用较小的堆重新启动。