所以,我们正在使用G1 GC。和18GB堆。年轻一代的大小约为3,5G。堆最大使用量约为12G。记忆中充满了短命的物体。
此外, Couchbase 实例在JVM附近的同一节点上运行也许很重要。当它在hdd上持续更改时,它经常需要所有iops,但是有足够的可用CPU时间和内存。
启用JVM选项:
-Xmx18g -Xms18g -XX:MaxPermSize=512M -XX:+UseG1GC -XX:+DisableExplicitGC -XX:+ParallelRefProcEnabled
我悲伤的问题是年轻一代的GC暂停。长暂停GC日志通常如下所示:
2013-07-10T15:06:25.963+0400: 9122,066: [GC pause (young)
Desired survivor size 243269632 bytes, new threshold 5 (max 15)
- age 1: 69789280 bytes, 69789280 total
- age 2: 58618240 bytes, 128407520 total
- age 3: 54519720 bytes, 182927240 total
- age 4: 51592728 bytes, 234519968 total
- age 5: 45186520 bytes, 279706488 total
9122,066: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 174,16 ms, remaining time: 25,84 ms, target pause time: 200,00 ms]
9122,066: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 426 regions, survivors: 34 regions, predicted young region time: 164,97 ms]
9122,066: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 426 regions, survivors: 34 regions, old: 0 regions, predicted pause time: 339,13 ms, target pause time: 200,00 ms]
9122,259: [SoftReference, 0 refs, 0,0063780 secs]9124,575: [WeakReference, 4 refs, 0,0031600 secs]9124,578: [FinalReference, 1640 refs, 0,0033730 secs]9124,581: [PhantomReference, 145 refs, 0,0032080 secs]9124,5
85: [JNI Weak Reference, 0,0000810 secs], 2,53669600 secs]
[Parallel Time: 190,5 ms]
[GC Worker Start (ms): 9122066,6 9122066,7 9122066,7 9122066,7 9122066,8 9122066,9 9122066,9 9122066,9 9122066,9 9122067,0 9122067,0 9122067,1 9122067,1 9122067,1 9122067,1 9122067,2 91220
67,2 9122067,3
Avg: 9122067,0, Min: 9122066,6, Max: 9122067,3, Diff: 0,7]
[Ext Root Scanning (ms): 4,7 6,0 4,8 4,5 4,2 4,3 4,2 4,3 4,6 3,4 13,5 5,2 4,2 5,6 4,2 4,1 4,3 4,0
Avg: 5,0, Min: 3,4, Max: 13,5, Diff: 10,1]
[Update RS (ms): 20,9 19,6 21,1 21,3 21,2 21,2 21,3 21,2 21,7 21,5 12,1 20,2 21,1 19,4 21,0 21,1 20,7 21,2
Avg: 20,4, Min: 12,1, Max: 21,7, Diff: 9,6]
[Processed Buffers : 27 23 25 29 31 22 25 34 28 14 36 23 24 22 28 24 25 24
Sum: 464, Avg: 25, Min: 14, Max: 36, Diff: 22]
[Scan RS (ms): 9,0 9,2 8,7 8,8 9,1 9,1 8,9 9,1 8,3 9,2 9,0 9,1 9,2 9,2 9,1 9,0 9,0 9,1
Avg: 9,0, Min: 8,3, Max: 9,2, Diff: 1,0]
[Object Copy (ms): 145,1 145,0 145,2 145,1 145,1 144,9 145,1 144,9 144,9 145,4 144,8 144,8 144,8 145,0 145,0 145,1 145,2 144,9
Avg: 145,0, Min: 144,8, Max: 145,4, Diff: 0,6]
[Termination (ms): 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0
Avg: 0,0, Min: 0,0, Max: 0,0, Diff: 0,0]
[Termination Attempts : 5 8 2 11 5 6 6 5 5 7 4 7 2 9 8 5 7 8
Sum: 110, Avg: 6, Min: 2, Max: 11, Diff: 9]
[GC Worker End (ms): 9122246,4 9122246,6 9122246,7 9122246,6 9122246,7 9122246,7 9122246,5 9122246,7 9122246,5 9122246,5 9122246,6 9122246,7 9122246,8 9122246,4 9122246,6 9122246,5 9122246
,7 9122246,8
Avg: 9122246,6, Min: 9122246,4, Max: 9122246,8, Diff: 0,3]
[GC Worker (ms): 179,8 179,9 180,0 179,8 179,9 179,9 179,6 179,8 179,6 179,5 179,6 179,6 179,7 179,3 179,5 179,4 179,4 179,5
Avg: 179,7, Min: 179,3, Max: 180,0, Diff: 0,7]
[GC Worker Other (ms): 10,7 10,7 10,8 10,8 10,9 10,9 11,0 11,0 11,0 11,1 11,1 11,1 11,2 11,2 11,2 11,2 11,3 11,3
Avg: 11,0, Min: 10,7, Max: 11,3, Diff: 0,6]
[Clear CT: 2,8 ms]
[Other: 2343,4 ms]
[Choose CSet: 0,1 ms]
[Ref Proc: 2327,7 ms]
[Ref Enq: 1,9 ms]
[Free CSet: 8,2 ms]
[Eden: 3408M(3408M)->0B(3400M) Survivors: 272M->280M Heap: 9998M(18432M)->6638M(18432M)]
[Times: user=3,26 sys=0,02, real=2,54 secs]
Total time for which application threads were stopped: 2,5434370 seconds
产生问题的唯一GC阶段是“参考处理”。但是日志看起来很奇怪:软,弱,最终和jni引用处理花了很少的时间。但总时间是2.5秒。或者它可以更多,在最坏的情况下最多10秒。
另一个暂停(更舒服)可能如下:
2013-07-10T16:26:11.862+0400: 13907,965: [GC pause (young)
Desired survivor size 243269632 bytes, new threshold 4 (max 15)
- age 1: 69125832 bytes, 69125832 total
- age 2: 58756480 bytes, 127882312 total
- age 3: 52397376 bytes, 180279688 total
- age 4: 88850424 bytes, 269130112 total
13907,965: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 77,38 ms, remaining time: 122,62 ms, target pause time: 200,00 ms]
13907,965: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 427 regions, survivors: 33 regions, predicted young region time: 167,95 ms]
13907,965: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 427 regions, survivors: 33 regions, old: 0 regions, predicted pause time: 245,33 ms, target pause time: 200,00 ms]
13908,155: [SoftReference, 0 refs, 0,0041340 secs]13908,160: [WeakReference, 0 refs, 0,0023850 secs]13908,162: [FinalReference, 1393 refs, 0,0065970 secs]13908,169: [PhantomReference, 108 refs, 0,0018650 secs]13908,171: [JNI Weak Reference, 0,0000630 secs], 0,22008100 secs]
[Parallel Time: 188,4 ms]
[GC Worker Start (ms): 13907965,3 13907965,3 13907965,4 13907965,4 13907965,5 13907965,5 13907965,6 13907965,6 13907965,6 13907965,7 13907965,7 13907965,7 13907965,8 13907965,8 13907965,8 13907965,9 13907965,9 13907965,9
Avg: 13907965,6, Min: 13907965,3, Max: 13907965,9, Diff: 0,6]
[Ext Root Scanning (ms): 5,8 5,0 6,8 6,3 6,1 6,2 6,0 6,3 5,2 4,2 5,0 6,2 4,5 6,0 17,1 4,4 6,2 5,3
Avg: 6,3, Min: 4,2, Max: 17,1, Diff: 12,9]
[Update RS (ms): 24,8 26,0 23,9 24,1 24,1 24,1 24,2 23,9 25,0 25,2 25,1 24,1 26,0 24,3 13,7 25,7 24,2 24,7
Avg: 24,1, Min: 13,7, Max: 26,0, Diff: 12,2]
[Processed Buffers : 30 20 9 16 16 19 20 21 22 12 30 17 17 20 12 20 17 22
Sum: 340, Avg: 18, Min: 9, Max: 30, Diff: 21]
[Scan RS (ms): 7,5 7,1 7,2 7,5 7,6 7,5 7,5 7,6 7,1 7,4 7,6 7,2 7,2 7,4 7,2 7,5 7,0 7,7
Avg: 7,4, Min: 7,0, Max: 7,7, Diff: 0,7]
[Object Copy (ms): 133,1 133,1 133,2 133,1 133,2 133,1 133,2 133,1 133,5 134,0 133,0 133,2 133,0 132,9 132,6 133,1 133,2 132,9
Avg: 133,1, Min: 132,6, Max: 134,0, Diff: 1,3]
[Termination (ms): 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0
Avg: 0,0, Min: 0,0, Max: 0,0, Diff: 0,0]
[Termination Attempts : 1 3 1 1 1 1 1 1 1 1 2 3 2 1 1 1 1 1
Sum: 24, Avg: 1, Min: 1, Max: 3, Diff: 2]
[GC Worker End (ms): 13908136,6 13908136,9 13908136,5 13908136,7 13908136,7 13908136,8 13908136,7 13908136,7 13908136,8 13908136,8 13908136,5 13908136,6 13908136,5 13908136,5 13908136,5 13908136,5 13908136,8 13908136,6
Avg: 13908136,7, Min: 13908136,5, Max: 13908136,9, Diff: 0,4]
[GC Worker (ms): 171,3 171,6 171,1 171,2 171,2 171,3 171,1 171,1 171,1 171,2 170,8 170,9 170,7 170,7 170,7 170,6 171,0 170,7
Avg: 171,0, Min: 170,6, Max: 171,6, Diff: 0,9]
[GC Worker Other (ms): 17,2 17,2 17,3 17,3 17,4 17,4 17,5 17,5 17,5 17,5 17,6 17,6 17,7 17,7 17,7 17,7 17,8 17,8
Avg: 17,5, Min: 17,2, Max: 17,8, Diff: 0,6]
[Clear CT: 1,6 ms]
[Other: 30,1 ms]
[Choose CSet: 0,1 ms]
[Ref Proc: 17,1 ms]
[Ref Enq: 0,9 ms]
[Free CSet: 7,4 ms]
[Eden: 3416M(3416M)->0B(3456M) Survivors: 264M->224M Heap: 7289M(18432M)->3912M(18432M)]
[Times: user=3,16 sys=0,00, real=0,22 secs]
参考处理仍然是最长的阶段,但它要短得多。 ParallelRefProcEnabled并不能解决我的问题。我也试图改变年轻人的体型。它也没有帮助。设置不同的-XX:MaxGCPauseMillis,更宽松的600ms或更严格的100,导致吞吐量仍然很差。
CMS的性能甚至比G1更差,参数:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
年轻人的收藏品越来越频繁。
我对描述的日志感到困惑。明天我将尝试将Couchbase实例移动到另一个节点,以检查它是否冻结了GC线程。
但是,如果Couchbase不是重点,也许有人可以解释我对日志的意义。或者可能有一些神奇的CMS参数来修复这个问题。
我会很高兴得到任何帮助!
答案 0 :(得分:1)
我们自己解决了问题。我有一个新规则 - 永远不要在JVM附近安装Couchbase。 在过去,我们遇到了这样的问题,例如Postgresql与Couchbase冲突,因为Couchbase喜欢抓住所有磁盘操作,而Postgresql不能提交任何东西。
所以,隔离Couchbase,一切都会好起来的。