Thread.join()返回后,LWP是否应该仍在运行?

时间:2019-07-10 13:33:27

标签: cpython

我有一个挂起的python(CPython 3.6)进程。从gdb追溯,派生进程无限期地等待LWP / Thread所拥有的互斥量不再存在。经过一些调试后,我可以看到join()在阻塞的Thread上返回后,其LWP仍处于活动状态,其堆栈通过如下所示的回退来展开:

#0  dl_open_worker (a=a@entry=0x7fffefb0eb90) at dl-open.c:515
#1  0x00007ffff7b4b2df in __GI__dl_catch_exception (exception=0x7fffefb0eb70, operate=0x7ffff7de9dc0 <dl_open_worker>, args=0x7fffefb0eb90) at dl-error-skeleton.c:196
#2  0x00007ffff7de97ca in _dl_open (file=0x7ffff77d9bc0 "libgcc_s.so.1", mode=-2147483646, caller_dlopen=0x7ffff77d7deb <pthread_cancel_init+43>, nsid=<optimised out>, argc=9, argv=<optimised out>, env=0x7fffffffe318) at dl-open.c:605
#3  0x00007ffff7b4a3ad in do_dlopen (ptr=ptr@entry=0x7fffefb0edc0) at dl-libc.c:96
#4  0x00007ffff7b4b2df in __GI__dl_catch_exception (exception=exception@entry=0x7fffefb0ed60, operate=operate@entry=0x7ffff7b4a370 <do_dlopen>, args=args@entry=0x7fffefb0edc0) at dl-error-skeleton.c:196
#5  0x00007ffff7b4b36f in __GI__dl_catch_error (objname=objname@entry=0x7fffefb0edb0, errstring=errstring@entry=0x7fffefb0edb8, mallocedp=mallocedp@entry=0x7fffefb0edaf, operate=operate@entry=0x7ffff7b4a370 <do_dlopen>, args=args@entry=0x7fffefb0edc0) at dl-error-skeleton.c:215
#6  0x00007ffff7b4a4d9 in dlerror_run (args=0x7fffefb0edc0, operate=0x7ffff7b4a370 <do_dlopen>) at dl-libc.c:46
#7  __GI___libc_dlopen_mode (name=name@entry=0x7ffff77d9bc0 "libgcc_s.so.1", mode=mode@entry=-2147483646) at dl-libc.c:195
#8  0x00007ffff77d7deb in pthread_cancel_init () at ../sysdeps/nptl/unwind-forcedunwind.c:52
#9  0x00007ffff77d7fd4 in _Unwind_ForcedUnwind (exc=0x7fffefb0fd70, stop=stop@entry=0x7ffff77d5d80 <unwind_stop>, stop_argument=0x7fffefb0ef10) at ../sysdeps/nptl/unwind-forcedunwind.c:126
#10 0x00007ffff77d5f10 in __GI___pthread_unwind (buf=<optimised out>) at unwind.c:121
#11 0x00007ffff77cdae5 in __do_cancel () at pthreadP.h:297
#12 __pthread_exit (value=<optimised out>) at pthread_exit.c:28
#13 0x00007ffff7b14504 in __pthread_exit (retval=<optimised out>) at forward.c:173
#14 0x00000000006383c5 in PyThread_exit_thread () at ../Python/thread_pthread.h:300
#15 0x00000000005e5f0f in t_bootstrap () at ../Modules/_threadmodule.c:1030
#16 0x0000000000638084 in pythread_wrapper (arg=<optimised out>) at ../Python/thread_pthread.h:205
#17 0x00007ffff77cc6db in start_thread (arg=0x7fffefb0f700) at pthread_create.c:463
#18 0x00007ffff7b0588f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

此LWP是否应该存在或应该在join()之前终止?如果是后者,我是否正在查看CPython错误?

1 个答案:

答案 0 :(得分:0)

从检查源代码来看,这似乎是预料之中的。在https://github.com/python/cpython/blob/master/Modules/_threadmodule.c#L1028中,当线程完成其工作时,将进行以下两个调用:

PyThreadState_DeleteCurrent();
PyThread_exit_thread();

第一个结束时释放了允许join()返回的锁;第二个结果是LWP取消了堆栈。

调试一个琐碎的测试python脚本,它启动一个Threadjoin(),然后向自身发送信号(它提供了一个方便的断点,gdb可以理解)显示了类似的内容回溯到原始帖子中的内容,这进一步支持了这一结论。