此回溯来自多线程应用程序中的死锁情况。 其他死锁线程锁定在malloc()的调用内,并出现 等待这个帖子。
我不明白是什么创建了这个线程,因为它在调用之前就会死锁 我的申请中的任何功能:
Thread 6 (Thread 0x7ff69d43a700 (LWP 14191)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a299460d in _L_lock_27 () from /usr/lib64/libc.so.6
#2 0x00007ff6a29945bd in arena_thread_freeres () from /usr/lib64/libc.so.6
#3 0x00007ff6a2994662 in __libc_thread_freeres () from /usr/lib64/libc.so.6
#4 0x00007ff6a3875e38 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
clone()用于实现fork(),pthread_create()以及其他函数。请参阅here和here。
如何判断此跟踪是来自fork()
,pthread_create()
,信号处理程序还是其他内容?我只需要挖掘glibc代码,还是可以使用gdb或其他工具?为什么这个线程需要内部glibc锁?这对于确定死锁的原因很有用。
其他信息和研究:
malloc()是线程安全的,但不是可重入的(递归安全的)(请参阅this和this,因此malloc()也不是异步信号安全的。我们不是为这个过程定义信号处理程序,所以我知道我们不从信号处理程序调用malloc()。死锁线程不会调用递归函数,并且回调是在新线程中处理的,所以我不认为我们应该担心这里的重入。(也许我错了?)
当产生许多回调信号(最终杀死)不同的进程时,会发生此死锁。回调是在他们自己的线程中产生的。
我们是否可能以不安全的方式使用malloc?
可能相关:
Malloc inside of signal handler causes deadlock.
How are signal handlers delivered in a multi-threaded application?
在glibc-2.17-162.el7中修复的glibc fork/malloc deadlock bug。这看起来很相似,但不是我的错误 - 我正在使用固定版本的glibc。
(我在创建一个最小的,完整的,可验证的例子时没有成功。不幸的是,重现的唯一方法是使用应用程序(Slurm),并且很难重现。)
修改 这是所有线程的回溯。线程6是我最初发布的跟踪。线程1正在等待pthread_join()。调用malloc()后,线程2-5被锁定。线程7正在侦听消息并在新线程中生成回调(线程2-5)。那些将是最终标志着其他进程的回调。
Thread 7 (Thread 0x7ff69e672700 (LWP 12650)):
#0 0x00007ff6a291aa3d in poll () from /usr/lib64/libc.so.6
#1 0x00007ff6a3c09064 in _poll_internal (shutdown_time=<optimized out>, nfds=2,
pfds=0x7ff6980009f0) at ../../../../slurm/src/common/eio.c:364
#2 eio_handle_mainloop (eio=0xf1a970) at ../../../../slurm/src/common/eio.c:328
#3 0x000000000041ce78 in _msg_thr_internal (job_arg=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:245
#4 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 6 (Thread 0x7ff69d43a700 (LWP 14191)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a299460d in _L_lock_27 () from /usr/lib64/libc.so.6
#2 0x00007ff6a29945bd in arena_thread_freeres () from /usr/lib64/libc.so.6
#3 0x00007ff6a2994662 in __libc_thread_freeres () from /usr/lib64/libc.so.6
#4 0x00007ff6a3875e38 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 5 (Thread 0x7ff69e773700 (LWP 22471)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x7ff6a4086700 (LWP 5633)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x7ff69d53b700 (LWP 12963)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 2 (Thread 0x7ff69f182700 (LWP 19734)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x7ff6a4088880 (LWP 12616)):
#0 0x00007ff6a3876f57 in pthread_join () from /usr/lib64/libpthread.so.0
#1 0x000000000041084a in _wait_for_io (job=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/mgr.c:2219
#2 job_manager (job=job@entry=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/mgr.c:1397
#3 0x000000000040ca07 in main (argc=1, argv=0x7fffacab93d8)
at ../../../../../slurm/src/slurmd/slurmstepd/slurmstepd.c:172
答案 0 :(得分:3)
回溯中存在 <head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script>
$(document).ready(function(){
$('.calc').on("keypress", function(e) {
/* ENTER PRESSED*/
if (e.keyCode == 13) {
var ptag4 = 1;
var ptag1 = $(this).find('td.ryo input').val();
var ptag2 = $(this).find('td.qty input').val();
$(this).append(ptag4 + "," + ptag1 + "," + ptag2 + ", ");
var ptag3 = ptag1*ptag2;
//troubleshoot value fetch
$(this).find('td:nth-child(4)').html(ptag3);
}
});
});
</script>
</head>
<body>
<table class="calc">
<tbody><tr class="ka">
<th>Kōmoku-mei</th>
<th>Kakaku</th>
<th>Ryō</th>
<th>Gōkei</th>
</tr>
<tr class="ka">
<td class="kom">
Gom Kōmoku
</td>
<td class="ryo">
<input readonly="readonly" type="text" name="val1" value="23">
</td>
<td class="qty">
<input class="kin" type="text" name="val1" value="1">
</td>
<td class="gokei">
276
</td>
<td class="closeb">
X
</td>
</tr>
<tr class="ka">
<td class="kom">
Gom Kōmoku
</td>
<td class="ryo">
<input readonly="readonly" type="text" name="val1" value="23">
</td>
<td class="qty">
<input class="kin" type="text" name="val1" value="1">
</td>
<td class="gokei">
276
</td>
<td class="closeb">
X
</td>
</tr>
</table>
</body>
表示这是一个start_thread()
主题。
pthread_create()
是glibc在线程退出时调用的函数,它调用一组回调来释放内部每线程状态。这表示您突出显示的主题正处于退出过程中。
__libc_thread_freeres()
是其中一个回调。它适用于malloc竞技场分配器,它将空闲列表从现有线程的私有竞技场移动到全局空闲列表。要执行此操作,必须使用锁定来保护全局空闲列表(这是arena_thread_freeres()
中的list_lock
)。
似乎是此锁定突出显示的线程(线程6)被阻止。
竞技场分配器安装arena.c
处理程序,在pthread_atfork()
处理开始时锁定列表锁定,并在结束时解锁。这意味着当其他 fork()
处理程序正在运行时,所有其他线程都将阻止此锁定。
您是否正在安装自己的pthread_atfork()
处理程序?似乎其中一个可能导致你的僵局。