pbs_server,E5-2620v4和一般保护

时间:2016-08-11 20:53:06

标签: c++ memory-leaks

我正在尝试在Intel Xeon E5-2620v4上的Debian 8.5上安装扭矩6.0.2。但是,当我尝试启动pbs_server时,我返回了一个段错误,使用gdb:

#1  0x0000000000440ab6 in container::item_container<pbsnode*>::unlock (this=0xb5d900 <allnodes>) at ../../src/include/container.hpp:537
#2  0x00000000004b787f in mom_hierarchy_handler::nextNode (this=0x4e610c0 <hierarchy_handler>, iter=0x7fffffff98b8) at mom_hierarchy_handler.cpp:122
#3  0x00000000004b7a7d in mom_hierarchy_handler::make_default_hierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:149
#4  0x00000000004b898d in mom_hierarchy_handler::loadHierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:433
#5  0x00000000004b8ae8 in mom_hierarchy_handler::initialLoadHierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:472
#6  0x000000000045262a in pbsd_init (type=1) at pbsd_init.c:2299
#7  0x00000000004591ff in main (argc=2, argv=0x7fffffffdec8) at pbsd_main.c:1883

dmesg的:

traps: pbs_server[22249] general protection ip:7f9c08a7a2c8 sp:7ffe520b5238 error:0 in libpthread-2.19.so[7f9c08a69000+18000]

的valgrind:

==22381== Memcheck, a memory error detector
==22381== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==22381== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==22381== Command: pbs_server
==22381==
==22381==
==22381== HEAP SUMMARY:
==22381==     in use at exit: 18,051 bytes in 53 blocks
==22381==   total heap usage: 169 allocs, 116 frees, 42,410 bytes allocated
==22381==
==22382==
==22382== HEAP SUMMARY:
==22382==     in use at exit: 19,755 bytes in 56 blocks
==22382==   total heap usage: 172 allocs, 116 frees, 44,114 bytes allocated
==22382==
==22381== LEAK SUMMARY:
==22381==    definitely lost: 0 bytes in 0 blocks
==22381==    indirectly lost: 0 bytes in 0 blocks
==22381==      possibly lost: 0 bytes in 0 blocks
==22381==    still reachable: 18,051 bytes in 53 blocks
==22381==         suppressed: 0 bytes in 0 blocks
==22381== Rerun with --leak-check=full to see details of leaked memory
==22381==
==22381== For counts of detected and suppressed errors, rerun with: -v
==22381== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==22383==
==22383== Process terminating with default action of signal 11 (SIGSEGV)
==22383==  General Protection Fault
==22383==    at 0x72192CB: __lll_unlock_elision (elision-unlock.c:33)
==22383==    by 0x4E7E1A: unlock_node(pbsnode*, char const*, char const*, int) (u_lock_ctl.c:268)
==22383==    by 0x4B7A66: mom_hierarchy_handler::make_default_hierarchy() (mom_hierarchy_handler.cpp:164)
==22383==    by 0x4B898C: mom_hierarchy_handler::loadHierarchy() (mom_hierarchy_handler.cpp:433)
==22383==    by 0x4B8AE7: mom_hierarchy_handler::initialLoadHierarchy() (mom_hierarchy_handler.cpp:472)
==22383==    by 0x452629: pbsd_init(int) (pbsd_init.c:2299)
==22383==    by 0x4591FE: main (pbsd_main.c:1883)
==22382== LEAK SUMMARY:
==22382==    definitely lost: 0 bytes in 0 blocks
==22382==    indirectly lost: 0 bytes in 0 blocks
==22382==      possibly lost: 0 bytes in 0 blocks
==22382==    still reachable: 19,755 bytes in 56 blocks
==22382==         suppressed: 0 bytes in 0 blocks
==22382== Rerun with --leak-check=full to see details of leaked memory
==22382==
==22382== For counts of detected and suppressed errors, rerun with: -v
==22382== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==22383==
==22383== HEAP SUMMARY:
==22383==     in use at exit: 325,348 bytes in 186 blocks
==22383==   total heap usage: 297 allocs, 111 frees, 442,971 bytes allocated
==22383==
==22383== LEAK SUMMARY:
==22383==    definitely lost: 134 bytes in 6 blocks
==22383==    indirectly lost: 28 bytes in 3 blocks
==22383==      possibly lost: 524 bytes in 17 blocks
==22383==    still reachable: 324,662 bytes in 160 blocks
==22383==         suppressed: 0 bytes in 0 blocks
==22383== Rerun with --leak-check=full to see details of leaked memory
==22383==
==22383== For counts of detected and suppressed errors, rerun with: -v
==22383== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
~

没有其他软件有这种行为,我在没有问题的情况下满负荷测试了2天。已经尝试更新处理器微码。请问,任何人都有扭矩6.0.2或其他情况的这种行为吗?

最好的问候。

1 个答案:

答案 0 :(得分:1)

这不是微码故障。无论你运行什么软件(而且glibc / libpthreads中的不是),这都是一个彻头彻尾的锁定平衡问题。

请勿尝试解锁已解锁的锁。这是禁止的行为,以及陷阱的原因。

出于性能原因,glibc并不打算测试它和段错误,所以很多破解的代码很长时间都没有用它。锁定设备OTOH的硬件实现确实会引发陷阱(英特尔TSX,IBM Power 8,S390 / X ......),因此这种破坏将在任何地方变得明显,非常快。