我有一个python脚本(2.7),它实现了一些C库(Numpy和Sklearn)。在其中我使用numpy构建一个数组,然后将数组输入Sklearn。当从带有29,000个标识符和每个标识符20个代表性数字的制表符分隔文件构建数组时,程序本身没有缺陷。
当我在更大的东西上运行它(在这种情况下,有89,000个标识符)时,程序会产生分段错误(核心转储)。
Numpy在将数组输入SKlearn时构建数组后发生错误。 的更新
我在它上面运行了GDB(以及建议的bt),它完成的所有操作都是在构建完阵列后打印出来的。
Program received signal SIGSEGV, Segmentation fault
0x00007ffff5e755c8 in ATL_dgezero () from /usr/lib/libblas.so.3
BT
#0 0x00007ffff5e755c8 in ATL_dgezero () from /usr/lib/libblas.so.3
#1 0x00007ffff5e9b09f in ATL_dprk_kmm () from /usr/lib/libblas.so.3
#2 0x00007ffff5ea71a5 in ATL_dsprk_rK () from /usr/lib/libblas.so.3
#3 0x00007ffff5eaaf43 in ATL_dsyrk () from /usr/lib/libblas.so.3
#4 0x00007ffff62fa063 in syrk (typenum=typenum@entry=12, trans=trans@entry=CblasNoTrans, n=84936, k=<optimized out>, A=A@entry=0x7fffd3371c10, lda=<optimized out>, R=0x7fffd3371a80, order=CblasRowMajor) at numpy/core/src/multiarray/cblasfuncs.c:131
#5 0x00007ffff63db162 in cblas_matrixproduct (typenum=typenum@entry=12, ap1=ap1@entry=0x7fffd3371c10, ap2=ap2@entry=0x7fffd3371940, out=out@entry=0x0)
at numpy/core/src/multiarray/cblasfuncs.c:729
#6 0x00007ffff63b01f4 in PyArray_MatrixProduct2 (op1=<optimized out>, op2=0x7fffd3371940, out=0x0) at numpy/core/src/multiarray/multiarraymodule.c:938
#7 0x00007ffff63b0caa in array_matrixproduct (__NPY_UNUSED_TAGGEDdummy= <optimized out>, args=0x7fffd3376d40, kwds=0x0)
at numpy/core/src/multiarray/multiarraymodule.c:2186
#8 0x000000000052c6d5 in PyEval_EvalFrameEx ()
#9 0x000000000055c594 in PyEval_EvalCodeEx ()
#10 0x000000000052ca8d in PyEval_EvalFrameEx ()
#11 0x000000000055c594 in PyEval_EvalCodeEx ()
#12 0x000000000052ca8d in PyEval_EvalFrameEx ()
#13 0x000000000055c594 in PyEval_EvalCodeEx ()
#14 0x000000000052ca8d in PyEval_EvalFrameEx ()
#15 0x000000000055c594 in PyEval_EvalCodeEx ()
#16 0x000000000052ca8d in PyEval_EvalFrameEx ()
#17 0x000000000055c594 in PyEval_EvalCodeEx ()
#18 0x000000000052ca8d in PyEval_EvalFrameEx ()
#19 0x000000000055c594 in PyEval_EvalCodeEx ()
#20 0x00000000005b7392 in PyEval_EvalCode ()
#21 0x0000000000469663 in ?? ()
#22 0x00000000004699e3 in PyRun_FileExFlags ()
#23 0x0000000000469f1c in PyRun_SimpleFileExFlags ()
#24 0x000000000046ab81 in Py_Main ()
#25 0x00007ffff7818ec5 in __libc_start_main (main=0x46ac3f <main>, argc=16, argv=0x7fffffffe298, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffffffe288) at libc-start.c:287
#26 0x000000000057497e in _start ()
`
我无法弄清楚这是否是一个内存问题,因为它可以很好地处理小输入文件,或者代码中的某些东西拒绝处理大数据集。
有关解决问题的另一种方法的建议或为什么会发生这种情况?
更新
完成运行strace
并在输出失败时将其输出
.......
brk(0x7a71000) = 0x7a71000
read(4, "TAC\nACTCAACCTTTGGGCGGAAAAGGTTAGC"..., 4096) = 4096
brk(0x8384000) = 0x8384000
read(4, "GTCTTCTAATGAACTAAACT\nTATCGATGATA"..., 4096) = 3675
read(4, "", 4096) = 0
brk(0x715f000) = 0x715f000
close(4) = 0
munmap(0x7ff031ad5000, 4096) = 0
write(1, "(84936, 14)\n", 12(84936, 14)
) = 12
munmap(0x7ff01458b000, 12587008) = 0
write(1, "\n \n \n -----"..., 203
-------------------------------------------------------
---array finished---
-------------------------------------------------------
) = 203
write(1, " \n", 9
) = 9
mmap(NULL, 57712996352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe29cce2000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fde9df831c8} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
ulimit -d
unlimited
更新 免费-h看看内存给了这个:
total used free shared buffers cached
Mem: 1.0T 973G 34G 17M 257M 743G
-/+ buffers/cache: 230G 777G
Swap: 0B 0B 0B