我们正在使用3个站点,每个站点3个节点Cassandra 1.1.12群集,每个节点分配8GB ram。我们一直在节点上看到长时间的GC暂停,这会影响我们的应用程序实时要求。我们运行的系统是8核心系统,内存为24GB。
我们已经看到120秒的停顿会阻止世界GC。
我们在JDK 1.7.0_04
上运行这些标志-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms8G
-Xmx8G
-Xmn1600M
-XX:+HeapDumpOnOutOfMemoryError
-Xss180k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
以下是导致长时间停顿的详细GC日志:
2014-02-23T11:50:19.231-0500: 2119627.980: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 145905916 Max Chunk Size: 3472057 Number of Blocks: 160577 Av. Block Size: 908 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119627.981: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 32269664 bytes, 32269664 total : 1356995K->44040K(1474560K), 4.5270760 secs] 5829345K->4546031K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 144857324 Max Chunk Size: 2423465 Number of Blocks: 160577 Av. Block Size: 902 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 4.5295190 secs] [Times: user=24.78 sys=0.07, real=4.52 secs] Heap after GC invocations=82068 (full 2561): par new generation total 1474560K, used 44040K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 4.5524690 seconds {Heap before GC invocations=82068 (full 2561): par new generation total 1474560K, used 1354760K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:51:14.221-0500: 2119682.970: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 144857324 Max Chunk Size: 2423465 Number of Blocks: 160577 Av. Block Size: 902 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119682.971: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 41744088 bytes, 41744088 total : 1354760K->52443K(1474560K), 2.1589280 secs] 5856751K->4582809K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 143937754 Max Chunk Size: 1505947 Number of Blocks: 160575 Av. Block Size: 896 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 2.1613420 secs] [Times: user=12.82 sys=0.04, real=2.16 secs] Heap after GC invocations=82069 (full 2561): par new generation total 1474560K, used 52443K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 2.1719930 seconds {Heap before GC invocations=82069 (full 2561): par new generation total 1474560K, used 1363163K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:52:33.089-0500: 2119761.837: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 143937754 Max Chunk Size: 1505947 Number of Blocks: 160575 Av. Block Size: 896 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119761.839: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 37906760 bytes, 37906760 total : 1363163K->48710K(1474560K), 3.5105890 secs] 5893529K->4611208K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 142756036 Max Chunk Size: 326281 Number of Blocks: 160573 Av. Block Size: 889 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 3.5130550 secs] [Times: user=18.81 sys=0.08, real=3.52 secs] Heap after GC invocations=82070 (full 2561): par new generation total 1474560K, used 48710K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 3.5236060 seconds {Heap before GC invocations=82070 (full 2561): par new generation total 1474560K, used 1359430K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:55:59.448-0500: 2119968.196: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 142756036 Max Chunk Size: 326281 Number of Blocks: 160573 Av. Block Size: 889 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119968.198: [ParNew (0: promotion failure size = 131074) (1: promotion failure size = 131074) (2: promotion failure size = 131074) (4: promotion failure size = 131074) (7: promotion failure size = 131074) (promotion failed) Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 34318480 bytes, 34318480 total : 1359430K->1353373K(1474560K), 1.5971880 secs]2119969.795: [CMSCMS: Large block 0x000000073aa59270 : 4586726K->3600612K(6750208K), 149.1735470 secs] 5921928K->3600612K(8224768K), [CMS Perm : 23966K->23925K(40012K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 403131826 Max Chunk Size: 403131826 Number of Blocks: 1 Av. Block Size: 403131826 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 150.7724630 secs] [Times: user=28.89 sys=11.57, real=150.75 secs] Heap after GC invocations=82071 (full 2562): par new generation total 1474560K, used 0K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 3600612K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23925K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 150.7833360 seconds
我还设置了一个夜间工作,在凌晨2点强制GC在实例上,希望它可以缓解这些问题 - 它已经做了一些好事,但我们每隔几天仍然遇到问题。/ p>
- 增加了细节 将NewGen更改为2GB(从1600M开始)后,我们在运行的前12个小时内看到暂停15秒。
......类直方图转储:
15772.786: [Class Histogram (before full gc): num #instances #bytes class name ---------------------------------------------- 1: 9743656 526609104 [B 2: 9176097 440452656 java.nio.HeapByteBuffer 3: 8152787 326111480 java.math.BigInteger 4: 8126173 321393760 [I 5: 207997 307212904 [J 6: 8940730 214577520 java.lang.Long 7: 8121743 194921832 org.apache.cassandra.db.DecoratedKey 8: 8121399 129942384 org.apache.cassandra.dht.BigIntegerToken 9: 174374 78049856 [Ljava.lang.Object; 10: 914261 43884528 edu.stanford.ppl.concurrent.SnapTreeMap$Node 11: 1112269 35592608 java.util.concurrent.ConcurrentHashMap$HashEntry 12: 1101827 35258464 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 13: 306955 29467680 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch 14: 1216257 29190168 org.apache.cassandra.cache.KeyCacheKey 15: 1111387 26673288 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue 16: 427695 20529360 edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder 17: 417278 16691120 org.apache.cassandra.db.ExpiringColumn 18: 306955 9822560 edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch 19: 401296 9631104 org.apache.cassandra.db.ISortedColumns$DeletionInfo 20: 305 8691760 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 21: 528460 8455360 java.util.concurrent.atomic.AtomicReference 22: 263965 8446880 edu.stanford.ppl.concurrent.SnapTreeMap 23: 342775 8226600 org.apache.cassandra.db.ColumnFamily 24: 263965 6335160 org.apache.cassandra.db.AtomicSortedColumns$Holder 25: 193133 6180256 org.apache.cassandra.db.DeletedColumn 26: 179519 5744608 java.util.ArrayList$Itr 27: 221515 5316360 java.util.concurrent.ConcurrentSkipListMap$Node 28: 35607 5025400 29: 35607 4852600 30: 61748 4445856 org.apache.cassandra.io.sstable.SSTableIdentityIterator 31: 264409 4230544 edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr 32: 52605 4216504 [Ljava.util.HashMap$Entry; 33: 3389 3868680 34: 221414 3542624 org.apache.cassandra.db.AtomicSortedColumns 35: 107140 3428480 org.apache.cassandra.db.Column 36: 69734 3347232 java.util.TreeMap 37: 131777 3162648 java.util.ArrayList 38: 111752 2682048 java.util.concurrent.ConcurrentSkipListMap$Index 39: 51879 2490192 java.util.HashMap 40: 3389 2438184 41: 2978 2284352 42: 55660 1781120 org.apache.cassandra.io.util.DataOutputBuffer 43: 69664 1671936 org.apache.cassandra.db.TreeMapBackedSortedColumns 44: 51697 1654304 org.apache.cassandra.db.ArrayBackedSortedColumns 45: 61842 1485456 [Ljava.nio.ByteBuffer; 46: 61748 1481952 org.apache.cassandra.utils.BytesReadTracker 47: 3533 1462144 48: 55672 1336128 org.apache.cassandra.io.util.FastByteArrayOutputStream 49: 21743 1318848 [C 50: 33109 1268616 [Lorg.apache.cassandra.db.IColumn; 51: 28607 1144280 java.util.HashMap$KeyIterator 52: 17623 1127616 [Ljava.util.Hashtable$Entry; 53: 35003 1120096 java.util.Vector 54: 17335 1109440 com.sun.jmx.remote.util.OrderClassLoaders 55: 17601 844848 java.util.Hashtable 56: 18528 741120 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 57: 22634 724288 java.lang.String 58: 17514 700560 java.security.ProtectionDomain 59: 19829 634528 java.util.HashMap$Entry 60: 25327 607848 org.apache.cassandra.utils.IntervalTree.Interval 61: 17513 560416 java.security.CodeSource 62: 22026 528624 java.lang.Double 63: 21740 521760 java.util.concurrent.LinkedBlockingDeque$Node 64: 18782 486728 [[J 65: 18660 447840 org.apache.cassandra.utils.BloomFilter 66: 18660 447840 org.apache.cassandra.utils.obs.OpenBitSet 67: 18529 444696 org.apache.cassandra.db.compaction.PrecompactedRow 68: 3717 444688 java.lang.Class 69: 18528 444672 org.apache.cassandra.db.ColumnIndexer$RowHeader 70: 18528 444672 org.apache.cassandra.utils.BloomCalculations$BloomSpecification 71: 24572 393152 java.lang.Object 72: 5753 335568 [S 73: 7326 293040 java.util.TreeMap$Entry 74: 17817 285072 java.util.HashSet 75: 5583 282488 [[I 76: 17514 280224 java.security.ProtectionDomain$Key 77: 17514 280224 [Ljava.security.Principal; 78: 6612 264480 org.apache.cassandra.service.WriteResponseHandler 79: 7862 251584 org.apache.cassandra.db.RowMutation 80: 10083 241992 org.apache.cassandra.db.EchoedRow 81: 6110 195520 org.apache.cassandra.utils.ExpiringMap$CacheableObject 82: 311 179136 83: 2927 163912 org.apache.cassandra.thrift.TBinaryProtocol 84: 6563 157512 org.apache.cassandra.net.Message 85: 6289 150936 org.apache.cassandra.net.Header 86: 6110 146640 org.apache.cassandra.net.CallbackInfo 87: 5996 143904 java.util.concurrent.LinkedBlockingQueue$Node 88: 8004 128064 java.util.HashMap$EntrySet 89: 7885 126160 java.util.TreeMap$Values 90: 7790 124640 java.util.concurrent.atomic.AtomicInteger 91: 5132 123168 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder 92: 6640 106240 org.apache.cassandra.utils.SimpleCondition 93: 6626 106016 java.util.HashMap$Values 94: 830 92960 java.net.SocksSocketImpl 95: 1076 86080 java.lang.reflect.Method 96: 4176 66816 java.lang.Integer 97: 2005 64160 java.lang.ThreadLocal$ThreadLocalMap$Entry 98: 1592 63680 java.lang.ref.SoftReference 99: 1586 63440 com.google.common.collect.SingletonImmutableMap 100: 1743 55776 java.util.TreeMap$ValueIterator 101: 470 48880 java.lang.Thread 102: 1469 47008 java.util.concurrent.locks.AbstractQueuedSynchronizer$Node 103: 1041 41640 java.lang.ref.Finalizer 104: 1002 40080 java.util.LinkedHashMap$Entry 105: 1603 38472 com.google.common.collect.SingletonImmutableSet 106: 1586 38064 com.google.common.collect.ImmutableEntry 107: 435 34800 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry; 108: 970 31040 java.net.Inet4Address 109: 425 30600 java.lang.reflect.Constructor 110: 396 28512 java.lang.reflect.Field 111: 825 26400 java.net.Socket 112: 1050 25200 java.util.concurrent.atomic.AtomicLong 113: 687 21984 java.util.concurrent.SynchronousQueue$TransferStack$SNode 114: 387 21672 sun.security.provider.MD5 115: 451 21648 java.util.concurrent.ThreadPoolExecutor$Worker 116: 442 21216 java.net.SocketInputStream 117: 525 21000 org.apache.cassandra.thrift.TCustomSocket 118: 514 20560 org.apache.cassandra.db.CounterColumn 119: 639 20448 120: 353 19768 org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT 121: 762 18288 java.io.FileDescriptor 122: 570 18240 org.apache.thrift.transport.TFramedTransport 123: 939 18200 [Ljava.lang.Class; 124: 546 17472 java.util.concurrent.locks.ReentrantLock$NonfairSync 125: 532 17464 [Ljava.lang.String; 126: 527 16864 java.util.Hashtable$Entry 127: 298 16688 org.apache.cassandra.service.ClientState$1 128: 298 16688 org.apache.cassandra.service.ClientState$2 129: 405 16200 org.apache.cassandra.thrift.Column 130: 323 15504 java.net.SocketOutputStream 131: 298 14304 org.apache.cassandra.service.ClientState 132: 585 14040 org.apache.thrift.transport.TMemoryInputTransport 133: 570 13680 org.apache.thrift.TByteArrayOutputStream 134: 284 13632 sun.nio.cs.UTF_8$Encoder 135: 405 12960 org.apache.cassandra.thrift.ColumnOrSuperColumn 136: 324 12960 java.io.BufferedInputStream 137: 530 12720 org.apache.cassandra.utils.EstimatedHistogram 138: 525 12600 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess 139: 388 12416 java.security.MessageDigest$Delegate 140: 129 12384 org.apache.cassandra.io.sstable.SSTableReader 141: 305 12200 java.util.concurrent.ConcurrentHashMap$Segment 142: 376 12032 org.apache.cassandra.db.columniterator.SSTableSliceIterator 143: 47 10904 [Z 144: 334 10688 org.apache.cassandra.net.OutboundTcpConnection$Entry 145: 330 10560 java.lang.ref.WeakReference 146: 435 10440 java.lang.ThreadLocal$ThreadLocalMap 147: 426 10224 java.util.BitSet 148: 314 10048 org.apache.cassandra.net.MessageDeliveryTask ... [removed anything taking less than 10K] Total 72302201 2937460496 , 3.1156640 secs] 15775.902: [CMSCMS: Large block 0x0000000714aace48 : 2751863K->2568755K(6340608K), 12.0184460 secs]15787.921: [Class Histogram (after full gc): num #instances #bytes class name ---------------------------------------------- 1: 8644126 451299632 [B 2: 8434549 404858352 java.nio.HeapByteBuffer 3: 7759859 310394360 java.math.BigInteger 4: 10686 293655512 [J 5: 7763265 248560200 [I 6: 8630461 207131064 java.lang.Long 7: 7759615 186230760 org.apache.cassandra.db.DecoratedKey 8: 7759483 124151728 org.apache.cassandra.dht.BigIntegerToken 9: 1825 64173960 [Ljava.lang.Object; 10: 1096613 35091616 java.util.concurrent.ConcurrentHashMap$HashEntry 11: 1092266 34952512 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 12: 1092266 26214384 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue 13: 1092266 26214384 org.apache.cassandra.cache.KeyCacheKey 14: 532463 25558224 edu.stanford.ppl.concurrent.SnapTreeMap$Node 15: 221414 21255744 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch 16: 411588 16463520 org.apache.cassandra.db.ExpiringColumn 17: 341705 16401840 edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder 18: 305 8691760 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 19: 442919 7086704 java.util.concurrent.atomic.AtomicReference 20: 221414 7085248 edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch 21: 221414 7085248 edu.stanford.ppl.concurrent.SnapTreeMap 22: 221515 5316360 java.util.concurrent.ConcurrentSkipListMap$Node 23: 221418 5314032 org.apache.cassandra.db.ColumnFamily 24: 221418 5314032 org.apache.cassandra.db.ISortedColumns$DeletionInfo 25: 221414 5313936 org.apache.cassandra.db.AtomicSortedColumns$Holder 26: 35593 5023912 27: 35593 4850696 28: 3382 3862240 29: 116355 3723360 org.apache.cassandra.db.DeletedColumn 30: 221414 3542624 org.apache.cassandra.db.AtomicSortedColumns 31: 221414 3542624 edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr 32: 111752 2682048 java.util.concurrent.ConcurrentSkipListMap$Index 33: 3382 2433816 34: 2971 2279488 35: 3533 1462144 36: 11281 891344 [C 37: 3710 443904 java.lang.Class 38: 12509 400288 java.lang.String 39: 23066 369056 java.lang.Object 40: 5742 334480 [S 41: 5417 275792 [[I 42: 8620 206880 java.lang.Double 43: 8596 206304 java.util.concurrent.LinkedBlockingDeque$Node 44: 311 179136 45: 1631 139152 [Ljava.util.HashMap$Entry; 46: 4006 128192 org.apache.cassandra.db.Column 47: 3630 116160 java.util.HashMap$Entry 48: 1066 85280 java.lang.reflect.Method 49: 1310 62880 java.util.HashMap 50: 509 57008 java.net.SocksSocketImpl 51: 1178 47120 java.lang.ref.SoftReference 52: 228 40752 [[J 53: 411 29592 java.lang.reflect.Constructor 54: 703 28120 java.lang.ref.Finalizer 55: 367 26424 java.lang.reflect.Field 56: 248 25792 java.lang.Thread 57: 804 25728 java.lang.ThreadLocal$ThreadLocalMap$Entry 58: 554 22160 java.util.LinkedHashMap$Entry 59: 431 20688 java.net.SocketInputStream 60: 514 20560 org.apache.cassandra.db.CounterColumn 61: 835 20040 java.util.concurrent.atomic.AtomicLong 62: 912 17768 [Ljava.lang.Class; 63: 546 17472 java.util.concurrent.locks.ReentrantLock$NonfairSync 64: 524 17272 [Ljava.lang.String; 65: 535 17120 java.net.Inet4Address 66: 1068 17088 java.lang.Integer 67: 213 17040 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry; 68: 708 16992 java.util.ArrayList 69: 1043 16688 java.util.concurrent.atomic.AtomicInteger 70: 684 16416 java.io.FileDescriptor 71: 506 16192 java.util.Hashtable$Entry 72: 504 16128 java.net.Socket 73: 184 15296 [Ljava.util.Hashtable$Entry; 74: 264 14784 org.apache.cassandra.thrift.TBinaryProtocol 75: 305 12200 java.util.concurrent.ConcurrentHashMap$Segment 76: 506 12144 org.apache.cassandra.utils.EstimatedHistogram 77: 201 11256 org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT 78: 117 11232 org.apache.cassandra.io.sstable.SSTableReader 79: 229 10992 java.util.concurrent.ThreadPoolExecutor$Worker 80: 47 10904 [Z 81: 320 10240 java.lang.ref.WeakReference 82: 319 10208 java.util.Vector 83: 181 10136 sun.security.provider.MD5 ... [removed anything taking up less than 10K] Total 65302653 2582483384 , 2.5709140 secs] 2951753K->2568755K(8183808K), [CMS Perm : 21044K->21000K(35192K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 482780727 Max Chunk Size: 482780727 Number of Blocks: 1 Av. Block Size: 482780727 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 17.7063220 secs] [Times: user=16.57 sys=0.05, real=17.70 secs] Heap after GC invocations=1233 (full 21): par new generation total 1843200K, used 0K [0x00000005fae00000, 0x0000000677e00000, 0x0000000677e00000) eden space 1638400K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000065ee00000) from space 204800K, 0% used [0x000000065ee00000, 0x000000065ee00000, 0x000000066b600000) to space 204800K, 0% used [0x000000066b600000, 0x000000066b600000, 0x0000000677e00000) concurrent mark-sweep generation total 6340608K, used 2568755K [0x0000000677e00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 35192K, used 21000K [0x00000007fae00000, 0x00000007fd05e000, 0x0000000800000000) } Total time for which application threads were stopped: 17.7081220 seconds
答案 0 :(得分:18)
首先需要额外提示,请考虑将其添加到GC设置中。它会在完整的GC上转储直方图,你可以用它来弄清楚占用堆的重要性:
JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics" JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramBeforeFullGC" JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramAfterFullGC"
另请查看system.log以获取来自MemoryMeter.java的消息。那些通常会在这些情况下吐出来,并且可以方便地获得更多线索。
根据GC日志中的消息,您似乎有促销失败:
2119968.198: [ParNew (0: promotion failure size = 131074) (1: promotion failure size = 131074) (2: promotion failure size = 131074) (4: promotion failure size = 131074) (7: promotion failure size = 131074) (promotion failed) Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 34318480 bytes, 34318480 total
这意味着你的对象离开HEAP的时间太长而你的堆碎片化导致它没有足够的空间来推动从新一代到幸存者或老一代的对象。这对于您拥有的强大硬件来说并不正常,但即使在具有70Gb的16核机器上也确实发生过这种情况,所以这里有一些需要连续查看的内容,以便您可以识别问题。如果你有直方图,我告诉你我们可以更快地得出结论,但这里是尝试的列表:
1.如果您启用了多线程压缩,请将其禁用;
2.你的CPU怎么样?如果利用率很高,请将系统中默认为8个线程的并发压缩器减少到4;
3.Cassandra 1.1存在密钥缓存大小计算错误。它使用48字节的硬编码大小来计算用于密钥缓存的堆空间,如果您的应用程序生成的密钥长于此值,则堆上的密钥缓存使用量将显着高于您在casssandra.yaml中实际设置的值。它会造成大量的堆压力,特别是那些长时间存在导致促销失败的东西:
https://issues.apache.org/jira/browse/CASSANDRA-4315
您可以从nodetool info命令或相应的JMX MBean检查密钥缓存使用情况。如果它通常是满的,那可能是一个线索;遗憾的是,该修复程序在1.1中不可用。如果遇到此问题,您应该升级到1.2,或者在没有升级的情况下解决它,在cassandra.yaml中显着降低密钥缓存大小的值;
4.您使用行缓存吗?试着把它关掉。
5.Wide Row或我通常所说的死亡行被读取。如果您的应用程序正在生成宽行(包含大量列的1个键),并且正在尝试读取该行的大片(> 100列),则可能取决于这些列中的数据大小,您将创建节点长期的GC会像这样停下来。尝试咨询您的开发人员,确保您从宽行一次读取小片;
您可以查看nodetool cfstats命令,了解Column Families的Max Row大小。如果你有一个最大行大小为几百MB或更多的CF,它可能是一个线索;
6.您的应用程序混合读/写是否很重?如果你有很多写作,那可能是难忘的压力。尝试减少cassandra.yaml中的memtable大小。
7.对于读/写重载,机会是GC设置中的默认终点阈值标志非常低,导致过早的促销。这很可能是您问题的症状。尝试将tenuringthreshold增加到像32这样的大值,以防止过早的促销,以便ParNew收集年轻一代的大部分垃圾。
您可以通过运行jstat命令来观察此暂缓问题和升级失败,并查看如何利用堆中的幸存者空间。我敢打赌,他们没有太多利用,物品直接从伊甸园到老一代。 ;)
8.如果全部失败,您的堆可能不足以处理您的工作量,因此您必须调整大小。这种可能性较小,您应该将其视为最后的手段,并谨慎行事。
8A:增加你的HEAP_NEWGEN。尝试将其从8GB增加到2GB。同样如上所述增加期限阈值;
8B:如果上述情况有所改善但没有100%消除问题,请考虑一次增加MAX_HEAP_SIZE 2GB。用它增加NEWGEN。确保它不超过MAX HEAP的1/4。
继续高达16GB但不超过这个,因为你只有24GB的总内存。
关于我自己: 我遇到了上述所有问题,我一次消除了一个问题。目前我运行的节点有24GB堆和6GB新生成,终点阈值为32.我所拥有的最大GC暂停时间是每天1-2次,每天1-2次。
希望这会对你有所帮助。
以下是一些关于GC的文章: http://fedora.fiz-karlsruhe.de/docs/Wiki.jsp?page=Java%20Heap%20%26%20GC%20Tuning http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html http://java.dzone.com/articles/how-tame-java-gc-pauses http://grokbase.com/t/cassandra/user/113qf50x4r/parnew-promotion-failed