我正在运行Cassandra 1.0.7,5个节点,每个节点有8GB物理RAM,而我的堆是4GB。 我经常开始收到这样的节点故障:
WARN [ScheduledTasks:1] 2013-04-10 10:18:12,042 GCInspector.java (line 145) Heap is 0.9602098156121341 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-04-10 10:18:12,042 StorageService.java (line 2645) Flushing CFS(Keyspace='Company', ColumnFamily='01_Meta') to relieve memory pressure
WARN [ScheduledTasks:1] 2013-04-10 10:18:14,403 GCInspector.java (line 145) Heap is 0.9610030442856479 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-04-10 10:18:14,403 StorageService.java (line 2645) Flushing CFS(Keyspace='Company', ColumnFamily='01_Meta') to relieve memory pressure
ERROR [MutationStage:23969] 2013-04-10 10:18:18,339 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:23969,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
at org.apache.cassandra.utils.SlabAllocator.allocate(SlabAllocator.java:68)
at org.apache.cassandra.utils.Allocator.clone(Allocator.java:32)
at org.apache.cassandra.db.Column.localCopy(Column.java:244)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:215)
at org.apache.cassandra.db.Memtable.put(Memtable.java:143)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:805)
at org.apache.cassandra.db.Table.apply(Table.java:431)
at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:256)
at org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:416)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1223)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
启动参数为:
/usr/lib/jvm/jdk1.6.0_31/bin/java
-ea
-javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms4G
-Xmx4G heap size
-Xmn200M
-XX:+HeapDumpOnOutOfMemoryError
-Xss128k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-Dcassandra-pidfile=/var/run/cassandra/cassandra.pid
-cp /etc/cassandra/conf:/usr/share/cassandra/lib/antlr-
关于从哪里开始的任何想法?我在这看: http://www.datastax.com/docs/1.0/operations/tuning#tuning-options-for-size-tiered-compaction http://www.datastax.com/docs/1.0/operations/tuning#tuning-java-heap-size
但到目前为止,似乎没有什么不同寻常的。任何建议都非常感谢。
答案 0 :(得分:3)
事实上,如果您偏离了cassandra-env.sh中的任何JVM设置,并且您没有100%完全理解您所改变的含义,那么您已经陷入了困境。如果你没有将所有内容从JVM和Cassandra中绘制出来,那么你就更多了。
除此之外,几乎不可能在没有大量信息的情况下诊断内存问题,因此您需要非常密切地查看数据访问模式。尝试回答这个问题:
查看nodetool cfstats以了解任何与众不同的内容,例如,您希望是瘦的非常宽的行或者行占用的空间比您预期的要多得多。
您应该拥有可以从Cassandra和JVM中提取的每个指标的图表。我为此目的使用jmxtrans和graphite,这些是我的cassandra集群中的核心工具,我从中获得的洞察力以及随之而来的数据重构将我从几乎每天中断的12节点集群带到3节点集群,过去没有停机时间一年(并且流量增加一倍),所以我不能强调这一点,您需要正确的趋势,以便生产群集正确理解,管理和优化您的数据访问。