Java 8 G1GC Ext根扫描随着线程数的增加而增加

时间:2020-07-29 05:35:01

标签: java java-8 garbage-collection g1gc

基础; Java 8 Oracle,8 CPU服务器4GB堆。

我们的服务器出现性能问题,并将其范围缩小到了GC。突然,GC性能下降并且“根扫描”成为了原因。使用1个GC线程,该阶段的总时间约为400毫秒。 2个线程是800毫秒(所以实时仍为400毫秒)8个线程是3200毫秒(仍然为400毫秒)

这向我表明每个线程有一些400ms的“固定成本”-这是我需要深入了解的问题。

从GC日志中提取-第一个GC正常,之后的一个异常;

2020-07-28T20:22:47.119+0200: 398.095: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 127926272 bytes, new threshold 15 (max 15)
- age   1:   36475752 bytes,   36475752 total
- age   2:    4016576 bytes,   40492328 total
- age   3:    2422456 bytes,   42914784 total
- age   6:       1936 bytes,   42916720 total
- age   7:     478904 bytes,   43395624 total
, 0.1002403 secs]
   [Parallel Time: 67.9 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 398096.0, Avg: 398096.1, Max: 398096.1, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 7.7, Avg: 10.9, Max: 23.3, Diff: 15.6, Sum: 86.8]
      [Update RS (ms): Min: 0.0, Avg: 11.6, Max: 14.5, Diff: 14.5, Sum: 93.2]
         [Processed Buffers: Min: 0, Avg: 36.5, Max: 55, Diff: 55, Sum: 292]
      [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.1, Sum: 1.8]
      [Code Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 1.2, Diff: 0.7, Sum: 6.8]
      [Object Copy (ms): Min: 43.2, Avg: 44.1, Max: 44.8, Diff: 1.6, Sum: 352.4]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
         [Termination Attempts: Min: 1, Avg: 70.3, Max: 87, Diff: 86, Sum: 562]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.6]
      [GC Worker Total (ms): Min: 67.7, Avg: 67.7, Max: 67.8, Diff: 0.1, Sum: 541.8]
      [GC Worker End (ms): Min: 398163.8, Avg: 398163.8, Max: 398163.9, Diff: 0.1]
   [Code Root Fixup: 0.3 ms]
   [Code Root Purge: 0.1 ms]
   [Clear CT: 0.6 ms]
   [Other: 31.3 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 26.8 ms]
      [Ref Enq: 1.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 2.4 ms]
   [Eden: 1885.0M(1885.0M)->0.0B(1854.0M) Survivors: 66.0M->118.0M Heap: 2861.0M(3356.0M)->1026.8M(3356.0M)]
 [Times: user=0.59 sys=0.00, real=0.11 secs] 
2020-07-28T20:23:43.857+0200: 454.820: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 129499136 bytes, new threshold 15 (max 15)
- age   1:   60550712 bytes,   60550712 total
- age   2:   35224048 bytes,   95774760 total
- age   3:    4015112 bytes,   99789872 total
- age   4:    2420472 bytes,  102210344 total
- age   7:       1856 bytes,  102212200 total
- age   8:     478904 bytes,  102691104 total
, 0.4842469 secs]
   [Parallel Time: 458.1 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 454820.7, Avg: 454820.8, Max: 454820.9, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 403.6, Avg: 404.2, Max: 406.3, Diff: 2.8, Sum: 3233.9]
      [Update RS (ms): Min: 8.5, Avg: 10.1, Max: 10.4, Diff: 1.9, Sum: 81.0]
         [Processed Buffers: Min: 25, Avg: 33.3, Max: 51, Diff: 26, Sum: 266]
      [Scan RS (ms): Min: 0.2, Avg: 0.3, Max: 0.4, Diff: 0.2, Sum: 2.1]
      [Code Root Scanning (ms): Min: 0.2, Avg: 1.1, Max: 2.1, Diff: 1.9, Sum: 8.9]
      [Object Copy (ms): Min: 41.2, Avg: 42.1, Max: 43.2, Diff: 2.0, Sum: 336.8]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
         [Termination Attempts: Min: 1, Avg: 68.8, Max: 91, Diff: 90, Sum: 550]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.6]
      [GC Worker Total (ms): Min: 457.9, Avg: 457.9, Max: 458.0, Diff: 0.2, Sum: 3663.5]
      [GC Worker End (ms): Min: 455278.7, Avg: 455278.7, Max: 455278.8, Diff: 0.1]
   [Code Root Fixup: 0.4 ms]
   [Code Root Purge: 0.1 ms]
   [Clear CT: 0.7 ms]
   [Other: 25.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 21.4 ms]
      [Ref Enq: 0.7 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 1.9 ms]
   [Eden: 1854.0M(1854.0M)->0.0B(56.0M) Survivors: 118.0M->111.0M Heap: 2880.8M(3356.0M)->1019.8M(3356.0M)]
 [Times: user=0.92 sys=0.00, real=0.48 secs] 

关于这可能是什么或如何诊断的任何建议?

  • 停止所有线程所花费的时间不是原因(启用这些标志以进行测量然后再次禁用它们)
  • 无法在具有相同应用程序和GC设置但负载不同的其他服务器上重现。
  • 缓慢的GC时间不是立即发生的,而是在应用程序启动后约5分钟,这可能与CXF传入流量的爆发相吻合
  • 此后的每个GC都有相同的问题

0 个答案:

没有答案