由于卡表验证失败导致JVM崩溃与G1 GC

时间:2016-04-11 12:38:39

标签: java linux garbage-collection jvm g1gc

最近我们开始使用OpenJDK JRE而不是Oracle的。环境包括:

  1. Tomcat 8.0.29
  2. CentOS 6.7(2.6.32-279.el6.imp6.numa.x86_64#1 SMP Wed Mar 30 11:50:42 IDT 2016 x86_64 x86_64 x86_64 GNU / Linux)
  3. Java 8 update 71
  4. G1垃圾收集算法
  5. 由于“Java Runtime Environment已检测到致命错误”,服务器会定期崩溃。我向java社区(JDK-8144331)报告了这个问题,但他们将其放弃了“通常我们不关注OpenJDK问题”的说法。

    我开始自己研究根本原因,我用调试版替换了生产RPM并安装了GDB& debuginfo包。下一个错误打印到catalina.out:

    == CT verification failed: [0x00007fbc0d051000,0x00007fbc0d0517ff]
    ==   expecting value: 32
    ==   card 0x00007fbc0d051639 [0x00000000fdec7200,0x00000000fdec7400], val: 0
    ==   card 0x00007fbc0d05163a [0x00000000fdec7400,0x00000000fdec7600], val: 0
    ==   card 0x00007fbc0d05163b [0x00000000fdec7600,0x00000000fdec7800], val: 0
    ==   card 0x00007fbc0d05163c [0x00000000fdec7800,0x00000000fdec7a00], val: 0
    ==   card 0x00007fbc0d05163e [0x00000000fdec7c00,0x00000000fdec7e00], val: 0
    ==   card 0x00007fbc0d05163f [0x00000000fdec7e00,0x00000000fdec8000], val: 0
    ==   card 0x00007fbc0d051640 [0x00000000fdec8000,0x00000000fdec8200], val: 0
    ==   card 0x00007fbc0d051641 [0x00000000fdec8200,0x00000000fdec8400], val: 0
    ==   card 0x00007fbc0d051642 [0x00000000fdec8400,0x00000000fdec8600], val: 0
    ==   card 0x00007fbc0d051643 [0x00000000fdec8600,0x00000000fdec8800], val: 0
    ==   card 0x00007fbc0d051644 [0x00000000fdec8800,0x00000000fdec8a00], val: 0
    ==   card 0x00007fbc0d051645 [0x00000000fdec8a00,0x00000000fdec8c00], val: 0
    ==   card 0x00007fbc0d051647 [0x00000000fdec8e00,0x00000000fdec9000], val: 0
    ==   card 0x00007fbc0d051648 [0x00000000fdec9000,0x00000000fdec9200], val: 0
    ==   card 0x00007fbc0d051649 [0x00000000fdec9200,0x00000000fdec9400], val: 0
    ==   card 0x00007fbc0d05164a [0x00000000fdec9400,0x00000000fdec9600], val: 0
    ==   card 0x00007fbc0d05164b [0x00000000fdec9600,0x00000000fdec9800], val: 0
    ==   card 0x00007fbc0d05164c [0x00000000fdec9800,0x00000000fdec9a00], val: 0
    ==   card 0x00007fbc0d05164d [0x00000000fdec9a00,0x00000000fdec9c00], val: 0
    ==   card 0x00007fbc0d05164e [0x00000000fdec9c00,0x00000000fdec9e00], val: 0
    ==   card 0x00007fbc0d05164f [0x00000000fdec9e00,0x00000000fdeca000], val: 0
    ==   card 0x00007fbc0d051651 [0x00000000fdeca200,0x00000000fdeca400], val: 0
    ==   card 0x00007fbc0d051652 [0x00000000fdeca400,0x00000000fdeca600], val: 0
    ==   card 0x00007fbc0d051655 [0x00000000fdecaa00,0x00000000fdecac00], val: 0
    ==   card 0x00007fbc0d051657 [0x00000000fdecae00,0x00000000fdecb000], val: 0
    ==   card 0x00007fbc0d05165d [0x00000000fdecba00,0x00000000fdecbc00], val: 0
    ==   card 0x00007fbc0d051664 [0x00000000fdecc800,0x00000000fdecca00], val: 0
    ==   card 0x00007fbc0d05166b [0x00000000fdecd600,0x00000000fdecd800], val: 0
    ==   card 0x00007fbc0d051672 [0x00000000fdece400,0x00000000fdece600], val: 0
    ==   card 0x00007fbc0d051679 [0x00000000fdecf200,0x00000000fdecf400], val: 0
    ==   card 0x00007fbc0d05167a [0x00000000fdecf400,0x00000000fdecf600], val: 0
    ==   card 0x00007fbc0d05167b [0x00000000fdecf600,0x00000000fdecf800], val: 0
    ==   card 0x00007fbc0d05167d [0x00000000fdecfa00,0x00000000fdecfc00], val: 0
    ==   card 0x00007fbc0d05167f [0x00000000fdecfe00,0x00000000fded0000], val: 0
    ==   card 0x00007fbc0d051680 [0x00000000fded0000,0x00000000fded0200], val: 0
    ==   card 0x00007fbc0d051681 [0x00000000fded0200,0x00000000fded0400], val: 0
    ==   card 0x00007fbc0d051683 [0x00000000fded0600,0x00000000fded0800], val: 0
    ==   card 0x00007fbc0d051685 [0x00000000fded0a00,0x00000000fded0c00], val: 0
    ==   card 0x00007fbc0d051686 [0x00000000fded0c00,0x00000000fded0e00], val: 0
    ==   card 0x00007fbc0d051687 [0x00000000fded0e00,0x00000000fded1000], val: 0
    ==   card 0x00007fbc0d051688 [0x00000000fded1000,0x00000000fded1200], val: 0
    ==   card 0x00007fbc0d051689 [0x00000000fded1200,0x00000000fded1400], val: 0
    ==   card 0x00007fbc0d05168b [0x00000000fded1600,0x00000000fded1800], val: 0
    # To suppress the following error report, specify this argument
    # after -XX: or in .hotspotrc:  SuppressErrorAt=/cardTableModRefBS.cpp:687
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  Internal Error (/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/memory/cardTableModRefBS.cpp:687), pid=4326, tid=140445634340608
    #  guarantee(!failures) failed: there should not have been any failures
    

    GDB工具从核心转储中提取下一个堆栈跟踪:

    #0  0x00007fbc1ee41625 in raise () from /lib64/libc.so.6
    #1  0x00007fbc1ee42e05 in abort () from /lib64/libc.so.6
    #2  0x00007fbc1e2dc173 in os::abort (dump_core=true)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1500
    #3  0x00007fbc1e4a9352 in VMError::report_and_die (this=0x7fbc0c2511c0)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060
    #4  0x00007fbc1de441c6 in report_vm_error (
        file=0x7fbc1e5848d0 "/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/memory/cardTableModRefBS.cpp",
        line=687, error_msg=0x7fbc1e5855c8 "guarantee(!failures) failed", detail_msg=0x7fbc1e5855a0 "there should not have been any failures")
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/utilities/debug.cpp:226
    #5  0x00007fbc1dd1943a in CardTableModRefBS::verify_region (this=0x7fbc18040ef0, mr=..., val=32 ' ', val_equals=true)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/memory/cardTableModRefBS.cpp:687
    #6  0x00007fbc1df5f4e7 in G1SATBCardTableModRefBS::verify_g1_young_region (this=0x7fbc18040ef0, mr=...)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/gc_implementation/g1/g1SATBCardTableModRefBS.cpp:123
    #7  0x00007fbc1df3191f in G1CollectedHeap::verify_dirty_region (this=0x7fbc18029d10, hr=0x7fbc1845f330)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:5984
    #8  0x00007fbc1df3198b in G1CollectedHeap::verify_dirty_young_list (this=0x7fbc18029d10, head=0x7fbc1844ec30)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:5993
    #9  0x00007fbc1df319db in G1CollectedHeap::verify_dirty_young_regions (this=0x7fbc18029d10)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:5998
    #10 0x00007fbc1df2e483 in G1CollectedHeap::do_collection_pause_at_safepoint (this=0x7fbc18029d10, target_pause_time_ms=200)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:3946
    #11 0x00007fbc1e4e1dc4 in VM_G1IncCollectionPause::doit (this=0x7fbb701d20f0)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/gc_implementation/g1/vm_operations_g1.cpp:152
    #12 0x00007fbc1e4e0505 in VM_Operation::evaluate (this=0x7fbb701d20f0)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/runtime/vm_operations.cpp:62
    #13 0x00007fbc1e4dea8c in VMThread::evaluate_operation (this=0x7fbc1855e800, op=0x7fbb701d20f0)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:377
    #14 0x00007fbc1e4df0b1 in VMThread::loop (this=0x7fbc1855e800)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:502
    #15 0x00007fbc1e4de74d in VMThread::run (this=0x7fbc1855e800)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:276
    #16 0x00007fbc1e2dad34 in java_start (thread=0x7fbc1855e800)
        at /usr/src/debug/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:782
    #17 0x00007fbc1f7d4aa1 in start_thread () from /lib64/libpthread.so.0
    #18 0x00007fbc1eef793d in clone () from /lib64/libc.so.6
    

    现在我很清楚,由于在 VM_G1IncCollectionPause 期间卡表的验证失败而发生错误。我怀疑发生了一些内存损坏。

    该问题在Oracle的JRE和默认GC上都不可重现。我们确实使用JNI调用。

    问题:

    1. 有关其他工具的建议,以了解如何发生此类情况
    2. Oracle实现的G1实现是否有所不同OpenJDK项目?
    3. 为什么卡表验证如此重要?
    4. 更新1:

      OpenJDK RPM信息:

      Name        : java-1.8.0-openjdk-headless  Relocations: (not relocatable)
      Version     : 1.8.0.71                          Vendor: CentOS
      Release     : 1.b15.el6_7                   Build Date: Thu 21 Jan 2016 08:25:22 PM IST
      Install Date: Wed 06 Apr 2016 09:28:27 AM IDT      Build Host: c6b8.bsys.dev.centos.org
      Group       : Development/Languages         Source RPM: java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.src.rpm
      Size        : 106657831                        License: ASL 1.1 and ASL 2.0 and GPL+ and GPLv2 and GPLv2 with exceptions and LGPL+ and LGPLv2 and MPLv1.0 and MPLv1.1 and Public Domain and W3C
      Signature   : RSA/SHA1, Thu 21 Jan 2016 09:36:21 PM IST, Key ID 0946fca2c105b9de
      Packager    : CentOS BuildSystem <http://bugs.centos.org>
      URL         : http://openjdk.java.net/
      Summary     : OpenJDK Runtime Environment
      Description :
      The OpenJDK runtime environment without audio and video support.
      

      更新2:

      我设法在Oracle的JRE版本8更新77下重现崩溃,但错误发生在不同的方法:

      崩溃错误1:

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007fc00c86f514, pid=13705, tid=140459499214592
      #
      # JRE version: Java(TM) SE Runtime Environment (8.0_77-b03) (build 1.8.0_77-b03)
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.77-b03 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      # V  [libjvm.so+0xa28514]  ObjectSynchronizer::inflate(Thread*, oopDesc*)+0x384
      
      Stack: [0x00007fbf46865000,0x00007fbf468e6000],  sp=0x00007fbf468e0ab0,  free space=494k
      Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
      V  [libjvm.so+0xa28514]  ObjectSynchronizer::inflate(Thread*, oopDesc*)+0x384
      V  [libjvm.so+0xa299c4]  ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x74
      V  [libjvm.so+0x70ffa8]  JVM_IHashCode+0xb8
      J 605  java.lang.System.identityHashCode(Ljava/lang/Object;)I (0 bytes) @ 0x00007fbffa475eff [0x00007fbffa475e40+0xbf]
      J 241498 C2 java.util.IdentityHashMap.resize(I)Z (153 bytes) @ 0x00007fbffb1bbec4 [0x00007fbffb1bbd60+0x164]
      J 20915 C2 org.hibernate.engine.internal.EntityEntryContext.addEntityEntry(Ljava/lang/Object;Lorg/hibernate/engine/spi/EntityEntry;)V (212 bytes) @ 0x00007fbffbe4691c [0x00007fbffbe45c00+0xd1c]
      

      崩溃错误2:

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007f5d3a49a3a5, pid=30526, tid=140038358337280
      #
      # JRE version: Java(TM) SE Runtime Environment (8.0_77-b03) (build 1.8.0_77-b03)
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.77-b03 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      # V  [libjvm.so+0x5c13a5]  G1ParScanThreadState::copy_to_survivor_space(InCSetState, oopDesc*, markOopDesc*)+0x45
      #
      
      Stack: [0x00007f5d388a8000,0x00007f5d389a9000],  sp=0x00007f5d389a72f0,  free space=1020k
      Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
      V  [libjvm.so+0x5c13a5]  G1ParScanThreadState::copy_to_survivor_space(InCSetState, oopDesc*, markOopDesc*)+0x45
      V  [libjvm.so+0x5c1f5b]  G1ParScanThreadState::trim_queue()+0x4ab
      V  [libjvm.so+0x59ca28]  G1ParEvacuateFollowersClosure::do_void()+0x208
      V  [libjvm.so+0x5a8373]  G1ParTask::work(unsigned int)+0x463
      V  [libjvm.so+0xadac8f]  GangWorker::loop()+0xcf
      V  [libjvm.so+0x91d9d8]  java_start(Thread*)+0x108
      

      该问题已报告给甲骨文团队。

2 个答案:

答案 0 :(得分:1)

你需要澄清你从哪里得到OpenJDK;有很多来源,其中一些不一定与源树中可用的显式标签相同。您可能想尝试像Azul Zulu这样的开源版本,它已经过TCK测试套件。

您可以通过比较源代码来了解失败的原因;这是HotSpot verify_region函数的GitHub mirror;应该有一些输出也可以解释更多的tty

如果您在gdb下运行该进程并在CardTableModRefBS::verify_region函数的最后(就在断言之前)放置一个断点,那么您可以进一步调试失败。

您还可以尝试使用不同的卡片标记,例如-XX:+UseCondCardMark-XX:-UseCondCardMark,以查看切换这些标记是否有所不同。

一般而言,就虚拟机和GC而言,Oracle JDK和OpenJDK之间没有区别;但是,您安装的OpenJDK版本实际上可能不是来自测试或提供的源。因此,在进一步学习之前,您应该首先尝试使用经过认证的OpenJDK版本。

答案 1 :(得分:-1)

事故发生了 V [libjvm.so + 0x5c13a5] G1ParScanThreadState :: copy_to_survivor_space(InCSetState,oopDesc *,markOopDesc *)+ 0x45 这可能是由任何破坏堆内存的错误引起的。这可能是GC的一个问题,编译器带有错误的本机代码,如果你有强大的再现报告给Oracle,可重复的测试和重现步骤。您的问题将得到解决。