mpiexec 分段错误地址未映射

时间:2021-02-02 06:26:34

标签: shared-libraries mpi openmpi

我在编译之后尝试使用一些开源 cfd 代码。

但是当我输入命令'mpiexec ./data -tis 0 -tie 9700 -ts 100'时,它返回了错误信息

[songyi719-thinkpad-x1-extreme-2nd:13862] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:13862] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:13862] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:13862] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:13862] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f1206c313c0]
[songyi719-thinkpad-x1-extreme-2nd:13862] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7f1206e1a71b]
[songyi719-thinkpad-x1-extreme-2nd:13862] [ 2] ./data(+0x3a432)[0x55a320d75432]
[songyi719-thinkpad-x1-extreme-2nd:13862] [ 3] ./data(+0x98d9)[0x55a320d448d9]
[songyi719-thinkpad-x1-extreme-2nd:13862] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f1206a510b3]
[songyi719-thinkpad-x1-extreme-2nd:13862] [ 5] ./data(+0xa33e)[0x55a320d4533e]
[songyi719-thinkpad-x1-extreme-2nd:13862] *** End of error message ***

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 13862 RUNNING AT songyi719-thinkpad-x1-extreme-2nd
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

这意味着什么?我对 MPI 完全是个菜鸟。

此外,我尝试使用 valgrind 查找内存泄漏的位置。

==12707== Memcheck, a memory error detector
==12707== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12707== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12707== Command: ./data -tis 0 -tie 9700 -ts 100
==12707== 
==12708== Memcheck, a memory error detector
==12708== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12708== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12708== Command: /usr/local/bin/orted --hnp --set-sid --report-uri 19 --singleton-died-pipe 20 -mca state_novm_select 1 -mca ess hnp -mca pmix ^s1,s2,cray,isolated
==12708== 
==12707== Conditional jump or move depends on uninitialised value(s)
==12707==    at 0x6782BEA: PMIx_Get_nb (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x67836BD: PMIx_Get (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x677A9F5: PMIx_Init (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x6742C5A: pmix3x_client_init (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x5F17E4D: rte_init.part.0 (in /usr/local/lib/openmpi/mca_ess_singleton.so)
==12707==    by 0x595EBCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12707==    by 0x5239782: ompi_mpi_init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x51DBB62: PMPI_Init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x1423C0: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707== 
==12707== Invalid read of size 1
==12707==    at 0x51CA71B: PMPI_Comm_rank (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x142431: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==  Address 0x440000e8 is not stack'd, malloc'd or (recently) free'd
==12707== 
[songyi719-thinkpad-x1-extreme-2nd:12707] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:12707] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:12707] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:12707] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:12707] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x54093c0]
[songyi719-thinkpad-x1-extreme-2nd:12707] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x51ca71b]
[songyi719-thinkpad-x1-extreme-2nd:12707] [ 2] ./data(+0x3a432)[0x142432]
[songyi719-thinkpad-x1-extreme-2nd:12707] [ 3] ./data(+0x98d9)[0x1118d9]
[songyi719-thinkpad-x1-extreme-2nd:12707] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x543e0b3]
[songyi719-thinkpad-x1-extreme-2nd:12707] [ 5] ./data(+0xa33e)[0x11233e]
[songyi719-thinkpad-x1-extreme-2nd:12707] *** End of error message ***
==12707== 
==12707== Process terminating with default action of signal 11 (SIGSEGV)
==12707==    at 0x5409229: raise (raise.c:46)
==12707==    by 0x5A066AB: show_stackframe (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x54093BF: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==12707==    by 0x51CA71A: PMPI_Comm_rank (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x142431: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707== 
==12707== HEAP SUMMARY:
==12707==     in use at exit: 2,952,075 bytes in 10,345 blocks
==12707==   total heap usage: 22,923 allocs, 12,578 frees, 4,968,761 bytes allocated
==12707== 
==12707== 1 bytes in 1 blocks are definitely lost in loss record 2 of 8,075
==12707==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12707==    by 0x54B950E: strdup (strdup.c:42)
==12707==    by 0x7DD6536: ???
==12707==    by 0x7DB6373: ???
==12707==    by 0x59F33CB: mca_base_framework_components_register (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x59F3765: mca_base_framework_register (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x59F37C3: mca_base_framework_open (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x5239B94: ompi_mpi_init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x51DBB62: PMPI_Init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x1423C0: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707== 
==12707== 37 bytes in 1 blocks are definitely lost in loss record 5,335 of 8,075
==12707==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12707==    by 0x54A5DC7: __vasprintf_internal (vasprintf.c:71)
==12707==    by 0x5549742: __asprintf_chk (asprintf_chk.c:34)
==12707==    by 0x5A20A6D: opal_hwloc_base_get_locality_string (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x5928302: orte_ess_base_proc_binding (in /usr/local/lib/libopen-rte.so.40.30.0)
==12707==    by 0x5F18EF2: rte_init.part.0 (in /usr/local/lib/openmpi/mca_ess_singleton.so)
==12707==    by 0x595EBCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12707==    by 0x5239782: ompi_mpi_init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x51DBB62: PMPI_Init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x1423C0: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707== 
==12707== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely lost in loss record 6,379 of 8,075
==12707==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12707==    by 0x791414E: mca_mpool_hugepage_open (in /usr/local/lib/openmpi/mca_mpool_hugepage.so)
==12707==    by 0x59E912C: mca_base_framework_components_open (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x5A57C05: mca_mpool_base_open (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x59F3838: mca_base_framework_open (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x5239B35: ompi_mpi_init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x51DBB62: PMPI_Init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x1423C0: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707== 
==12707== 88 (24 direct, 64 indirect) bytes in 1 blocks are definitely lost in loss record 6,520 of 8,075
==12707==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12707==    by 0x5A24ADF: opal_hwloc201_hwloc_bitmap_alloc (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x592879B: orte_ess_base_proc_binding (in /usr/local/lib/libopen-rte.so.40.30.0)
==12707==    by 0x5F18EF2: rte_init.part.0 (in /usr/local/lib/openmpi/mca_ess_singleton.so)
==12707==    by 0x595EBCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12707==    by 0x5239782: ompi_mpi_init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x51DBB62: PMPI_Init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x1423C0: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707== 
==12707== 304 bytes in 1 blocks are possibly lost in loss record 7,820 of 8,075
==12707==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12707==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==12707==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==12707==    by 0x53FE322: allocate_stack (allocatestack.c:622)
==12707==    by 0x53FE322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==12707==    by 0x59CD869: opal_thread_start (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x59CD1FE: opal_progress_thread_init (in /usr/local/lib/libopen-pal.so.40.30.0)
==12707==    by 0x5F17DB4: rte_init.part.0 (in /usr/local/lib/openmpi/mca_ess_singleton.so)
==12707==    by 0x595EBCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12707==    by 0x5239782: ompi_mpi_init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x51DBB62: PMPI_Init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x1423C0: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707== 
==12707== 304 bytes in 1 blocks are possibly lost in loss record 7,821 of 8,075
==12707==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12707==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==12707==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==12707==    by 0x53FE322: allocate_stack (allocatestack.c:622)
==12707==    by 0x53FE322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==12707==    by 0x6752AA9: pmix_thread_start (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x67BA808: pmix_progress_thread_start (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x67B9A27: pmix_rte_init (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x6779B40: PMIx_Init (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x6742C5A: pmix3x_client_init (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x5F17E4D: rte_init.part.0 (in /usr/local/lib/openmpi/mca_ess_singleton.so)
==12707==    by 0x595EBCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12707==    by 0x5239782: ompi_mpi_init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x51DBB62: PMPI_Init (in /usr/local/lib/libmpi.so.40.30.0)
==12707== 
==12707== LEAK SUMMARY:
==12707==    definitely lost: 126 bytes in 4 blocks
==12707==    indirectly lost: 79 bytes in 2 blocks
==12707==      possibly lost: 608 bytes in 2 blocks
==12707==    still reachable: 2,951,262 bytes in 10,337 blocks
==12707==         suppressed: 0 bytes in 0 blocks
==12707== Reachable blocks (those to which a pointer was found) are not shown.
==12707== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12707== 
==12707== Use --track-origins=yes to see where uninitialised values come from
==12707== ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0)
==12707== 
==12707== 1 errors in context 1 of 8:
==12707== Invalid read of size 1
==12707==    at 0x51CA71B: PMPI_Comm_rank (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x142431: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==  Address 0x440000e8 is not stack'd, malloc'd or (recently) free'd
==12707== 
==12707== 
==12707== 1 errors in context 2 of 8:
==12707== Conditional jump or move depends on uninitialised value(s)
==12707==    at 0x6782BEA: PMIx_Get_nb (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x67836BD: PMIx_Get (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x677A9F5: PMIx_Init (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x6742C5A: pmix3x_client_init (in /usr/local/lib/openmpi/mca_pmix_pmix3x.so)
==12707==    by 0x5F17E4D: rte_init.part.0 (in /usr/local/lib/openmpi/mca_ess_singleton.so)
==12707==    by 0x595EBCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12707==    by 0x5239782: ompi_mpi_init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x51DBB62: PMPI_Init (in /usr/local/lib/libmpi.so.40.30.0)
==12707==    by 0x1423C0: PetscInitialize (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707==    by 0x1118D8: main (in /home/songyi719/Desktop/Research/SAFL-CFD-Lab-VFS-Wind-041aae8/Instructional_Cases/Test_01_3D_Sloshing/data)
==12707== 
==12707== ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0)
==12708== 
==12708== HEAP SUMMARY:
==12708==     in use at exit: 23,320 bytes in 288 blocks
==12708==   total heap usage: 20,484 allocs, 20,196 frees, 4,292,387 bytes allocated
==12708== 
==12708== 5 bytes in 1 blocks are definitely lost in loss record 8 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x49E550E: strdup (strdup.c:42)
==12708==    by 0x527EEEE: ???
==12708==    by 0x522F128: ???
==12708==    by 0x48A6CBC: pmix_server_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x51CA9E5: ???
==12708==    by 0x48F0BCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x489D60C: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== 38 bytes in 1 blocks are definitely lost in loss record 28 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x49E550E: strdup (strdup.c:42)
==12708==    by 0x51CAFC3: ???
==12708==    by 0x48F0BCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x489D60C: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== 38 bytes in 1 blocks are definitely lost in loss record 29 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x49E550E: strdup (strdup.c:42)
==12708==    by 0x51CB099: ???
==12708==    by 0x48F0BCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x489D60C: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== 136 bytes in 1 blocks are definitely lost in loss record 56 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x48E2609: orte_rmaps_base_print_mapping (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x48A9556: orte_pmix_server_register_nspace (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x489E644: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== 152 bytes in 1 blocks are definitely lost in loss record 58 of 81
==12708==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x4BB87CC: opal_libevent2022_event_base_once (event.c:1707)
==12708==    by 0x529FB62: ???
==12708==    by 0x527E00E: ???
==12708==    by 0x522FD45: ???
==12708==    by 0x48A716F: pmix_server_finalize (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x51C9E33: ???
==12708==    by 0x487E7D4: orte_finalize (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x489DD96: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== 668 (96 direct, 572 indirect) bytes in 2 blocks are definitely lost in loss record 70 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x528578F: ???
==12708==    by 0x528BCA4: ???
==12708==    by 0x5280F2D: ???
==12708==    by 0x5282035: ???
==12708==    by 0x52E1F31: ???
==12708==    by 0x4BB72C2: event_process_active_single_queue (event.c:1370)
==12708==    by 0x4BB72C2: event_process_active (event.c:1440)
==12708==    by 0x4BB72C2: opal_libevent2022_event_base_loop (event.c:1644)
==12708==    by 0x529F385: ???
==12708==    by 0x4929608: start_thread (pthread_create.c:477)
==12708==    by 0x4A65292: clone (clone.S:95)
==12708== 
==12708== 1,212 (640 direct, 572 indirect) bytes in 1 blocks are definitely lost in loss record 75 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x527E5BF: ???
==12708==    by 0x522F128: ???
==12708==    by 0x48A6CBC: pmix_server_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x51CA9E5: ???
==12708==    by 0x48F0BCB: orte_init (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x489D60C: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== 2,048 bytes in 1 blocks are definitely lost in loss record 77 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x4B72DA3: opal_dss_buffer_extend (in /usr/local/lib/libopen-pal.so.40.30.0)
==12708==    by 0x4B74EF5: opal_dss_pack_int32 (in /usr/local/lib/libopen-pal.so.40.30.0)
==12708==    by 0x4B74F79: opal_dss_pack (in /usr/local/lib/libopen-pal.so.40.30.0)
==12708==    by 0x48AA143: orte_pmix_server_register_nspace (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x489E644: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== 2,048 bytes in 1 blocks are definitely lost in loss record 78 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x4B72DA3: opal_dss_buffer_extend (in /usr/local/lib/libopen-pal.so.40.30.0)
==12708==    by 0x4B74EF5: opal_dss_pack_int32 (in /usr/local/lib/libopen-pal.so.40.30.0)
==12708==    by 0x4B74F79: opal_dss_pack (in /usr/local/lib/libopen-pal.so.40.30.0)
==12708==    by 0x48AA23F: orte_pmix_server_register_nspace (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x489E644: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== 5,614 (240 direct, 5,374 indirect) bytes in 1 blocks are definitely lost in loss record 81 of 81
==12708==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==12708==    by 0x489D4A8: orte_daemon (in /usr/local/lib/libopen-rte.so.40.30.0)
==12708==    by 0x10916D: main (in /usr/local/bin/orted)
==12708== 
==12708== LEAK SUMMARY:
==12708==    definitely lost: 5,441 bytes in 11 blocks
==12708==    indirectly lost: 6,518 bytes in 135 blocks
==12708==      possibly lost: 0 bytes in 0 blocks
==12708==    still reachable: 11,361 bytes in 142 blocks
==12708==         suppressed: 0 bytes in 0 blocks
==12708== Reachable blocks (those to which a pointer was found) are not shown.
==12708== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12708== 
==12708== ERROR SUMMARY: 10 errors from 10 contexts (suppressed: 0 from 0)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 12707 RUNNING AT songyi719-thinkpad-x1-extreme-2nd
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

我也尝试打开这个'data'共享库文件,但不幸的是它不是由x-sharedlib组成,不是由C++组成。所以我无法通过代码找到问题。


更新:

我已经按照评论重新编译了 petsc 和我的程序。

好像变得更糟了...

[songyi719-thinkpad-x1-extreme-2nd:400137] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:400137] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:400137] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:400137] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:400138] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:400138] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:400138] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:400138] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:400139] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:400139] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:400139] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:400139] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:400136] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:400136] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:400136] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:400136] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:400137] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fd8390913c0]
[songyi719-thinkpad-x1-extreme-2nd:400137] [ 1] [songyi719-thinkpad-x1-extreme-2nd:400138] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fdcdd9e43c0]
[songyi719-thinkpad-x1-extreme-2nd:400138] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7fdcddbcd71b]
[songyi719-thinkpad-x1-extreme-2nd:400138] [ 2] ./data(+0x3a432)[0x561338ba4432]
[songyi719-thinkpad-x1-extreme-2nd:400139] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f52593b23c0]
[songyi719-thinkpad-x1-extreme-2nd:400139] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7f525959b71b]
[songyi719-thinkpad-x1-extreme-2nd:400139] [ 2] ./data(+0x3a432)[0x5621d8ba1432]
[songyi719-thinkpad-x1-extreme-2nd:400139] [ 3] ./data(+0x98d9)[0x5621d8b708d9]
[songyi719-thinkpad-x1-extreme-2nd:400139] [songyi719-thinkpad-x1-extreme-2nd:400138] [ 3] ./data(+0x98d9)[0x561338b738d9]
[songyi719-thinkpad-x1-extreme-2nd:400138] [ 4] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7fd83927a71b]
[songyi719-thinkpad-x1-extreme-2nd:400137] [ 2] ./data(+0x3a432)[0x56089fc48432]
[songyi719-thinkpad-x1-extreme-2nd:400137] [ 3] ./data(+0x98d9)[0x56089fc178d9]
[songyi719-thinkpad-x1-extreme-2nd:400137] [ 4] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f52591d20b3]
[songyi719-thinkpad-x1-extreme-2nd:400139] [ 5] ./data(+0xa33e)[0x5621d8b7133e]
[songyi719-thinkpad-x1-extreme-2nd:400139] *** End of error message ***
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fd838eb10b3]
[songyi719-thinkpad-x1-extreme-2nd:400137] [ 5] ./data(+0xa33e)[0x56089fc1833e]
[songyi719-thinkpad-x1-extreme-2nd:400137] *** End of error message ***
[songyi719-thinkpad-x1-extreme-2nd:400136] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fd3fb5143c0]
[songyi719-thinkpad-x1-extreme-2nd:400136] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7fd3fb6fd71b]
[songyi719-thinkpad-x1-extreme-2nd:400136] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fdcdd8040b3]
[songyi719-thinkpad-x1-extreme-2nd:400138] [ 5] ./data(+0xa33e)[0x561338b7433e]
[songyi719-thinkpad-x1-extreme-2nd:400138] *** End of error message ***
./data(+0x3a432)[0x555663876432]
[songyi719-thinkpad-x1-extreme-2nd:400136] [ 3] ./data(+0x98d9)[0x5556638458d9]
[songyi719-thinkpad-x1-extreme-2nd:400136] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fd3fb3340b3]
[songyi719-thinkpad-x1-extreme-2nd:400136] [ 5] ./data(+0xa33e)[0x55566384633e]
[songyi719-thinkpad-x1-extreme-2nd:400136] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node songyi719-thinkpad-x1-extreme-2nd exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

0 个答案:

没有答案