我有cassandra 2.1.8集群有16个节点(Centos 6.6,1x4核xeon,32Gb RAM,3x3Tb硬盘,java 1.8.0_65),并试图逐个添加16个节点,但坚持使用第一个节点。
在新节点上启动cassandra进程后,从先前存在的节点到新添加的节点的16个流正在开始:
nodetool netstats |grep Already
Receiving 131 files, 241797656689 bytes total. Already received 100 files, 30419228367 bytes total
Receiving 150 files, 227954962242 bytes total. Already received 116 files, 29078363255 bytes total
Receiving 127 files, 239902942980 bytes total. Already received 103 files, 29680298986 bytes total
...
新节点处于“加入”状态(最后一行):
UN ...70 669.64 GB 256 ? a9c8adae-e54e-4e8e-a333-eb9b2b52bfed R0
UN ...71 638.09 GB 256 ? 6aa8cf0c-069a-4049-824a-8359d1c58e59 R0
UN ...80 667.07 GB 256 ? 7abb5609-7dca-465a-a68c-972e54469ad6 R1
UJ ...81 102.99 GB 256 ? c20e431e-7113-489f-b2c3-559bbd9916e2 R2
在几个小时内,加入过程看起来很正常,但在此之后,新节点上的cassandra进程正在消失,除了异常:
ERROR 09:07:37 Exception in thread Thread[Thread-1822,5,main]
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.8.jar:2.1.8]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:745)
我已经进行了6次或7次尝试,同时使用CMS和G1 GC,MAX_HEAP_SIZE从8G(默认)到16G,没有运气。似乎cassandra因为在不同的地方堆在外面而赶上oom:
RROR [CompactionExecutor:6] 2015-11-08 04:42:24,277 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:6,1,main]
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:75) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:70) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:48) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1822) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:83) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1967) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1810) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(SliceQueryPager.java:90) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(SliceQueryPager.java:38) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:155) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:144) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:427) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.compaction.CompactionManager$10.run(CompactionManager.java:1144) ~[apache-cassandra-2.1.8.jar:2.1.8]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
MAX_HEAP_SIZE的进一步扩大导致cassandra死于系统杀手。
有什么想法吗?
答案 0 :(得分:0)
我遇到了完全相同的问题(见my JIRA ticket),它似乎与一张有很多墓碑的桌子有关(大小分层的压实通常不能很好地清理)他们了)。一种可能的分类措施是简单地重新启动auto_bootstrap
设置为false的节点,然后运行nodetool rebuild
以完成该过程。这将导致保留现有数据,同时允许新节点提供流量。
但是你可能仍然有导致OOM的潜在问题。在流媒体会话期间(显然)有一些非常大的内容被实现到内存中,它可能是:
一个非常大的分区,可能会意外发生。检查cfstats
并查看最大分区字节。如果是这种情况,您需要处理根数据模型问题并清理该数据。
很多墓碑。您应该在日志中看到有关此内容的警告。
如果您确实遇到其中一个问题,那么在您能够成功传输之前,您几乎肯定必须解决这个问题。