两个线程之间的模糊锁定似乎与Global Interpreter Lock或其他一些“幕后锁定”有关,我不知道如何继续进行故障排除。如何消除锁定的任何提示将不胜感激。
该问题在更大的代码集中重现(不规律且有些随机)。代码严格来说是python。 Python版本是2.6.5(在Linux上)。当发生锁定时,故障排除时间减少了问题:
#5中的攻击性调用是函数unicode.encode,它应该是非阻塞的。线程1中线程锁定位置(如预期)中的以下代码打印'A'和'B':
print('A')
print('B')
但是,以下内容只会打印'A'并阻止该主题:
print('A')
u'hello'.encode('utf8') # This dummy (non-blocking) call locks up Thread 1
print('B')
这对我来说毫无意义。两个线程之间不存在逻辑死锁条件。线程1被非阻塞库调用阻塞,该调用不会以任何方式干扰线程2,线程2只是默默地等待获取RLock。我能想到线程1被阻止的唯一原因是它正在等待GIL。
有任何想法如何进一步解决这个问题,或者以某种方式控制或操纵GIL操作作为解决方法的任何机制?
编辑:响应samplebias的一些其他信息(并感谢您的回复)。由于问题似乎对可能影响两个线程之间的时序的任何事情非常敏感,因此我遇到了问题。然而,只运行-f选项运行strace,经过几次迭代后,我得到了一个跟踪。
线程1包含这三个调试语句,它们应该在控制台上打印两行“CHECK_IN”和“CHECK_TEST”:
print('CHECK IN')#DEBUG
u'hello'.encode('utf8')
print('CHECK TEST')#DEBUG
这是strace的最后一页:
8605 mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb753d000
8605 mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb6d3c000
8605 mprotect(0xb6d3c000, 4096, PROT_NONE) = 0
8605 clone(child_stack=0xb753c494, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb753cbd8, {entry_number:6, base_addr:0xb753cb70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb753cbd8) = 8606
8606 set_robust_list(0xb753cbe0, 0xc <unfinished ...>
8605 futex(0xa239138, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
8606 <... set_robust_list resumed> ) = 0
8606 futex(0xa239138, FUTEX_WAKE_PRIVATE, 1) = 1
8605 <... futex resumed> ) = 0
8606 gettimeofday( <unfinished ...>
8605 futex(0xa272398, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
8606 <... gettimeofday resumed> {1301528807, 326496}, NULL) = 0
8606 futex(0xa272398, FUTEX_WAKE_PRIVATE, 1) = 1
8605 <... futex resumed> ) = 0
8606 futex(0xa272398, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
8605 gettimeofday( <unfinished ...>
8606 <... futex resumed> ) = 0
8605 <... gettimeofday resumed> {1301528807, 326821}, NULL) = 0
8606 futex(0xa272398, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
8605 futex(0xa272398, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
8606 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
8605 <... futex resumed> ) = 0
8606 gettimeofday( <unfinished ...>
8605 futex(0xa272398, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
8606 <... gettimeofday resumed> {1301528807, 326908}, NULL) = 0
8606 futex(0xa272398, FUTEX_WAKE_PRIVATE, 1) = 1
8605 <... futex resumed> ) = 0
8606 futex(0xa272398, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
8605 futex(0xa1b0d70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
8606 <... futex resumed> ) = 0
8606 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2225, ...}) = 0
8606 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
8606 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6d3b000
8606 write(1, "CHECK IN\n", 9) = 9
8606 futex(0xa115270, FUTEX_WAIT_PRIVATE, 0, NULL
在程序锁定之前,三行代码的输出就是以下内容:
CHECK IN
因此,strace显示了线程1(#8606)如何写入'CHECK_IN'字符串,并且当到达unicode.encode时,调用进入一个永不返回的等待状态。
顺便说一句,我将在所有模块中进行一些未来的导入以保持一些较新的python约定......
from __future__ import print_function, unicode_literals
...但我看不出它们应该有任何区别 - 特别是因为u'hello'字符串被明确地称为unicode字符串。
答案 0 :(得分:3)
我在Python源代码中找不到会导致unicode.encode()
阻塞的内容,而我编写的虚拟程序试图重现它会按预期运行。你提到线程1已经获得了超过1个锁定 - 你是否已将这些锁定作为锁定源?
下面的测试用例是否在您的环境中显示相同的锁定?
import time
import threading
def worker(tid):
_lock.acquire()
if not tid:
# wait for rest of threads to enter acquire
time.sleep(0.5)
print('%d: A' % tid)
u'hello'.encode('utf-8')
print('%d: B' % tid)
_lock.release()
def start(tid):
th = threading.Thread(target=worker, args=(tid,))
th.start()
return th
_num = 2
_lock = threading.RLock()
workers = [start(n) for n in range(_num)]
while all(w.isAlive() for w in workers):
time.sleep(1)
输出:
0: A
0: B
1: A
1: B
您还可以在程序上运行strace
以确定进程被阻止的位置。例如,使用上面的脚本:
% strace -fTr -o trace.out python lockup.py
-o trace.out
标志告诉strace将输出写入文件。你可以省略它,strace将打印到stderr。
trace.out
的内容应该显示程序所做的所有系统调用,每行以线程ID和系统调用之间的相对时间为前缀。该行的结尾将花费在该系统调用中花费的时间。我用相应的Python代码注释了最后几个系统调用:
# thread 0 time.sleep(0.5) completes
24778 0.500124 <... select resumed> ) = 0 (Timeout) <0.500599>
# preparing to print()
24778 0.000071 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.000017>
24778 0.000058 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fe90a6000 <0.000018>
# print("0: A\n")..
24778 0.000079 write(1, "0: A\n", 5) = 5 <0.000023>
24778 0.000106 write(1, "0: B\n", 5) = 5 <0.000056>
# thread 0 _lock.release()
24778 0.000114 futex(0xe0f3c0, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000024>
24778 0.000108 madvise(0x7f8fe7266000, 8368128, MADV_DONTNEED) = 0 <0.000030>
# thread 0 exit
24778 0.000072 _exit(0) = ?
# thread 1 _lock.acquire()
24779 0.000050 <... futex resumed> ) = 0 <0.500774>
# thread 1 print("1: A\n") and so on..
24779 0.000052 write(1, "1: A\n", 5) = 5 <0.000026>
24779 0.000086 write(1, "1: B\n", 5) = 5 <0.000026>
24779 0.000099 madvise(0x7f8fe6a65000, 8368128, MADV_DONTNEED) = 0 <0.000024>
24779 0.000064 _exit(0) = ?
24777 0.499956 <... select resumed> ) = 0 (Timeout) <1.001138>
24777 0.000132 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f8fe8c7c8f0}, {0x4d9a90, [], SA_RESTORER, 0x7f8fe8c7c8f0}, 8) = 0 <0.000025>
# main thread process exit
24777 0.002349 exit_group(0) = ?