我们有一个..a GC..issue,其中VM将运行非常长的GC(1+秒。.我们的SLA要求为<200ms)。
(这是在Java 8上,我们已经看到了从版本11到最新的版本)。
我们的GC设置atm:
-XX:+PrintStringTableStatistics
-XX:+PrintAdaptiveSizePolicy
-XX:+ParallelRefProcEnabled
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms3000M
-Xmx3000M
-Xmn1500M
-XX:+HeapDumpOnOutOfMemoryError
-Xss256k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB
-XX:+PerfDisableSharedMem
-XX:CompileCommandFile=./../conf/hotspot_compiler
-XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
-XX:CMSWaitDuration=10000
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-Xloggc:/opt/openmarket/sqs/logs/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=100M
我们尝试了G1,但情况更糟,因此我们“坚持”使用CMS atm。
无论如何,这是我们的GC日志中的摘录。
应该注意的是,它运行了大约24小时,执行了其中的4个,其中2个是“长”的。(实际上,过去我们在产品中看到的时间更长,超过12秒!!)
我不喜欢的一件事..这就是为什么我们看到->清理字符串表,0.6001941秒 花费了这么长时间,我们没有'实习'很多字符串,所以启用-XX:+ PrintStringTableStatistics在那里没有显示任何有趣的活动,从中我们可以看出我们总共只有不到1000个实习字符串,因此这不应该作为因素
这里可以看到gc日志的高级摘要:
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDcvMjYvLS1ub2RlLTAwMS4zLS0xNS0yMi0zNA==
2018-07-25T18:15:52.608+0000: 30.543: Total time for which application threads were stopped: 0.0046391 seconds
CMS: abort preclean due to time 2018-07-25T18:15:53.553+0000: 31.488: [CMS-concurrent-abortable-preclean: 0.326/20.683 secs] [Times: user=25.25 sys=3.89, real=20.68 secs]
2018-07-25T18:15:53.557+0000: 31.492: [GC (CMS Final Remark) [YG occupancy: 588025 K (1382400 K)]31.492: [Rescan (parallel) , 0.0559721 secs]31.548: [weak refs processing, 0.0035595 secs]31.552: [class unloading, 0.0409634 secs]31.593: [scrub symbol table, 0.0051572 secs]31.598: [scrub string table, 0.0016212 secs][1 CMS-remark: 12246K(1536000K)] 600272K(2918400K), 0.1150145 secs] [Times: user=0.34 sys=0.00, real=0.11 secs]
2018-07-25T18:15:53.673+0000: 31.608: Total time for which application threads were stopped: 0.1195510 seconds
2018-07-25T18:15:53.673+0000: 31.608: [CMS-concurrent-sweep-start]
--
2018-07-25T18:18:45.477+0000: 203.412: Total time for which application threads were stopped: 0.0044451 seconds
CMS: abort preclean due to time 2018-07-25T18:18:51.809+0000: 209.743: [CMS-concurrent-abortable-preclean: 3.023/67.120 secs] [Times: user=18.52 sys=6.47, real=67.11 secs]
2018-07-25T18:18:51.812+0000: 209.747: [GC (CMS Final Remark) [YG occupancy: 595711 K (1382400 K)]209.747: [Rescan (parallel) , 0.1225090 secs]209.870: [weak refs processing, 0.0048343 secs]209.875: [class unloading, 0.0408876 secs]209.916: [scrub symbol table, 0.0047264 secs]209.920: [scrub string table, 0.0042791 secs][1 CMS-remark: 54877K(1536000K)] 650589K(2918400K), 0.1843075 secs] [Times: user=0.54 sys=0.00, real=0.18 secs]
2018-07-25T18:18:51.997+0000: 209.932: Total time for which application threads were stopped: 0.1885355 seconds
2018-07-25T18:18:52.002+0000: 209.937: Total time for which application threads were stopped: 0.0041075 seconds
--
2018-07-26T00:06:31.084+0000: 21069.019: Total time for which application threads were stopped: 0.0033592 seconds
CMS: abort preclean due to time 2018-07-26T00:06:56.329+0000: 21094.264: [CMS-concurrent-abortable-preclean: 3.400/86.919 secs] [Times: user=15.48 sys=7.48, real=86.91 secs]
2018-07-26T00:06:56.339+0000: 21094.274: [GC (CMS Final Remark) [YG occupancy: 604677 K (1382400 K)]21094.274: [Rescan (parallel) , 0.1252722 secs]21094.399: [weak refs processing, 0.0084571 secs]21094.408: [class unloading, 0.1359008 secs]21094.544: [scrub symbol table, 0.0037596 secs]21094.548: [scrub string table, 0.5171769 secs][1 CMS-remark: 1274484K(1536000K)] 1879161K(2918400K), 0.7978841 secs] [Times: user=1.13 sys=0.00, real=0.80 secs]
2018-07-26T00:06:57.138+0000: 21095.072: Total time for which application threads were stopped: 0.8018055 seconds
2018-07-26T00:06:57.138+0000: 21095.073: [CMS-concurrent-sweep-start]
--
2018-07-26T09:05:41.291+0000: 53419.226: Total time for which application threads were stopped: 0.0053130 seconds
CMS: abort preclean due to time 2018-07-26T09:06:08.087+0000: 53446.021: [CMS-concurrent-abortable-preclean: 2.304/99.556 secs] [Times: user=15.49 sys=8.46, real=99.54 secs]
2018-07-26T09:06:08.090+0000: 53446.025: [GC (CMS Final Remark) [YG occupancy: 605003 K (1382400 K)]53446.025: [Rescan (parallel) , 0.1410524 secs]53446.166: [weak refs processing, 0.0134749 secs]53446.180: [class unloading, 1.0563894 secs]53447.236: [scrub symbol table, 0.0137832 secs]53447.250: [scrub string table, 0.6001941 secs][1 CMS-remark: 1431566K(1536000K)] 2036569K(2918400K), 1.8335188 secs] [Times: user=1.28 sys=0.00, real=1.83 secs]
2018-07-26T09:06:09.924+0000: 53447.859: Total time for which application threads were stopped: 1.8372996 seconds
已更新: 这里是不同运行的结果,带有较少的标志(但实际上是相同的或更糟)(在A侧节点上。我们也已经用parallelOld进行了测试,这看起来很有希望)
标志:
-XX:+CMSClassUnloadingEnabled
-Xms3000M
-Xmx3000M
-Xmn1500M
-XX:+HeapDumpOnOutOfMemoryError
-Xss256k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:MaxTenuringThreshold=10
-XX:TargetSurvivorRatio=90
-XX:+UseTLAB
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-Xloggc:/opt/openmarket/sqs/logs/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=100M
GC日志片段:
018-07-29T09:22:01.752 + 0000:38922.964:[CMS-concurrent-abortable-preclean-start] CMS:由于时间而中止预清理2018-07-29T09:22:06.888 + 0000:38928.100:[CMS-concurrent-abortable-preclean:5.118 / 5.136 secs] [时间:user = 5.09 sys = 0.10,real = 5.14 secs] 2018-07-29T09:22:06.890 + 0000:38928.102:[GC(CMS结束语)[YG占用率:184100 K(1382400 K)] 2018-07-29T09:22:06.890 + 0000:38928.102:[重新扫描(平行),0.0609614秒] 2018-07-29T09:22:06.951 + 0000:38928.163:[弱引用处理,0.0026624秒] 2018-07-29T09:22:06.954 + 0000:38928.166:[类卸载,0.0369107秒] 2018- 07-29T09:22:06.991 + 0000:38928.203:[擦洗符号表,0.0024052秒] 2018-07-29T09:22:06.993 + 0000:38928.205:[擦洗字符串表,1.0810044秒] [1 CMS备注:1414399K( 1536000K)] 1598500K(2918400K),1.1918919秒] [时间:用户= 1.17 sys = 0.00,实际= 1.19秒] 2018-07-29T09:22:08.082 + 0000:38929.294:停止应用程序线程的总时间:1.1943563秒,停止线程花费的时间:0.0001797秒
2018-07-29T18:52:55.303 + 0000:73176.515:[CMS-concurrent-abortable-preclean-start] CMS:由于时间而中止预清理2018-07-29T18:53:00.352 + 0000:73181.564:[CMS-concurrent-abortable-preclean:5.029 / 5.049秒] [时间:用户= 5.83 sys = 0.31,实际= 5.05秒] 2018-07-29T18:53:00.355 + 0000:73181.567:[GC(CMS最终备注)[YG占用率:185778 K(1382400 K)] 2018-07-29T18:53:00.355 + 0000:73181.567:[重新扫描(平行),0.0650261秒] 2018-07-29T18:53:00.420 + 0000:73181.632:[弱引用处理,0.0025795秒] 2018-07-29T18:53:00.423 + 0000:73181.635:[类卸载,0.0257172秒] 2018- 07-29T18:53:00.448 + 0000:73181.660:[scrub符号表,0.0026111秒] 2018-07-29T18:53:00.451 + 0000:73181.663:[scrub字符串表,0.9664343秒] [1 CMS-备注:1414468K( 1536000K)] 1600246K(2918400K),1.0629217秒] [时间:用户= 1.11 sys = 0.00,真实= 1.06秒] 2018-07-29T18:53:01.418 + 0000:73182.630:停止应用程序线程的总时间:1.0657259秒,停止线程花费的时间:0.0002660秒
2018-07-30T12:30:12.498 + 0000:136613.710:[CMS-concurrent-abortable-preclean-start] CMS:由于时间而中止预清理2018-07-30T12:30:17.698 + 0000:136618.910:[CMS-concurrent-abortable-preclean:5.185 / 5.200 secs] [时间:user = 5.72 sys = 0.20,real = 5.20 secs] 2018-07-30T12:30:17.701 + 0000:136618.913:[GC(CMS结束语)[YG占用率:179068 K(1382400 K)] 2018-07-30T12:30:17.701 + 0000:136618.913:[重新扫描(平行),0.0632593秒] 2018-07-30T12:30:17.764 + 0000:136618.976:[弱引用处理,0.0022151秒] 2018-07-30T12:30:17.767 + 0000:136618.979:[类卸载,0.1576453秒] 2018- 07-30T12:30:17.924 + 0000:136619.136:[清理符号表,0.0043250秒] 2018-07-30T12:30:17.929 + 0000:136619.141:[清理字符串表,1.0465825秒] [1 CMS备注:1414152K( 1536000K)] 1593220K(2918400K),1.2744338秒] [时间:用户= 1.21 sys = 0.00,实际= 1.28秒] 2018-07-30T12:30:18.976 + 0000:136620.188:停止应用程序线程的总时间:1.2772222秒,停止线程花费的时间:0.0001826秒