我正在尝试在Intel Xeon E5-2620v4上的Debian 8.5上安装扭矩6.0.2。但是,当我尝试启动pbs_server时,我返回了一个段错误,使用gdb:
#1 0x0000000000440ab6 in container::item_container<pbsnode*>::unlock (this=0xb5d900 <allnodes>) at ../../src/include/container.hpp:537
#2 0x00000000004b787f in mom_hierarchy_handler::nextNode (this=0x4e610c0 <hierarchy_handler>, iter=0x7fffffff98b8) at mom_hierarchy_handler.cpp:122
#3 0x00000000004b7a7d in mom_hierarchy_handler::make_default_hierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:149
#4 0x00000000004b898d in mom_hierarchy_handler::loadHierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:433
#5 0x00000000004b8ae8 in mom_hierarchy_handler::initialLoadHierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:472
#6 0x000000000045262a in pbsd_init (type=1) at pbsd_init.c:2299
#7 0x00000000004591ff in main (argc=2, argv=0x7fffffffdec8) at pbsd_main.c:1883
dmesg的:
traps: pbs_server[22249] general protection ip:7f9c08a7a2c8 sp:7ffe520b5238 error:0 in libpthread-2.19.so[7f9c08a69000+18000]
的valgrind:
==22381== Memcheck, a memory error detector
==22381== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==22381== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==22381== Command: pbs_server
==22381==
==22381==
==22381== HEAP SUMMARY:
==22381== in use at exit: 18,051 bytes in 53 blocks
==22381== total heap usage: 169 allocs, 116 frees, 42,410 bytes allocated
==22381==
==22382==
==22382== HEAP SUMMARY:
==22382== in use at exit: 19,755 bytes in 56 blocks
==22382== total heap usage: 172 allocs, 116 frees, 44,114 bytes allocated
==22382==
==22381== LEAK SUMMARY:
==22381== definitely lost: 0 bytes in 0 blocks
==22381== indirectly lost: 0 bytes in 0 blocks
==22381== possibly lost: 0 bytes in 0 blocks
==22381== still reachable: 18,051 bytes in 53 blocks
==22381== suppressed: 0 bytes in 0 blocks
==22381== Rerun with --leak-check=full to see details of leaked memory
==22381==
==22381== For counts of detected and suppressed errors, rerun with: -v
==22381== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==22383==
==22383== Process terminating with default action of signal 11 (SIGSEGV)
==22383== General Protection Fault
==22383== at 0x72192CB: __lll_unlock_elision (elision-unlock.c:33)
==22383== by 0x4E7E1A: unlock_node(pbsnode*, char const*, char const*, int) (u_lock_ctl.c:268)
==22383== by 0x4B7A66: mom_hierarchy_handler::make_default_hierarchy() (mom_hierarchy_handler.cpp:164)
==22383== by 0x4B898C: mom_hierarchy_handler::loadHierarchy() (mom_hierarchy_handler.cpp:433)
==22383== by 0x4B8AE7: mom_hierarchy_handler::initialLoadHierarchy() (mom_hierarchy_handler.cpp:472)
==22383== by 0x452629: pbsd_init(int) (pbsd_init.c:2299)
==22383== by 0x4591FE: main (pbsd_main.c:1883)
==22382== LEAK SUMMARY:
==22382== definitely lost: 0 bytes in 0 blocks
==22382== indirectly lost: 0 bytes in 0 blocks
==22382== possibly lost: 0 bytes in 0 blocks
==22382== still reachable: 19,755 bytes in 56 blocks
==22382== suppressed: 0 bytes in 0 blocks
==22382== Rerun with --leak-check=full to see details of leaked memory
==22382==
==22382== For counts of detected and suppressed errors, rerun with: -v
==22382== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==22383==
==22383== HEAP SUMMARY:
==22383== in use at exit: 325,348 bytes in 186 blocks
==22383== total heap usage: 297 allocs, 111 frees, 442,971 bytes allocated
==22383==
==22383== LEAK SUMMARY:
==22383== definitely lost: 134 bytes in 6 blocks
==22383== indirectly lost: 28 bytes in 3 blocks
==22383== possibly lost: 524 bytes in 17 blocks
==22383== still reachable: 324,662 bytes in 160 blocks
==22383== suppressed: 0 bytes in 0 blocks
==22383== Rerun with --leak-check=full to see details of leaked memory
==22383==
==22383== For counts of detected and suppressed errors, rerun with: -v
==22383== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
~
没有其他软件有这种行为,我在没有问题的情况下满负荷测试了2天。已经尝试更新处理器微码。请问,任何人都有扭矩6.0.2或其他情况的这种行为吗?
最好的问候。
答案 0 :(得分:1)
这不是微码故障。无论你运行什么软件(而且glibc / libpthreads中的不是),这都是一个彻头彻尾的锁定平衡问题。
请勿尝试解锁已解锁的锁。这是禁止的行为,以及陷阱的原因。
出于性能原因,glibc并不打算测试它和段错误,所以很多破解的代码很长时间都没有用它。锁定设备OTOH的硬件实现确实会引发陷阱(英特尔TSX,IBM Power 8,S390 / X ......),因此这种破坏将在任何地方变得明显,非常快。