这是我第一次尝试使用Python中的线程...而且它失败了:)我想实现一个基本的临界区问题,并发现这段代码实际上并没有出现问题。
问题:为什么我的计数器增量有问题?运行后计数器不应该有随机值吗?如果递增已经原子执行,或者如果线程不是并发的话,我只能解释这个......
import threading
import time
turnstile_names = ["N", "E", "S", "W"]
count = 0
class Counter(threading.Thread):
def __init__(self, id):
threading.Thread.__init__(self)
self.id = id
def run(self):
global count
for i in range(20):
#self.sem.acquire()
count = count + 1
#self.sem.release()
def main():
sem = threading.Semaphore(1)
counters = [Counter(name) for name in turnstile_names]
for counter in counters:
counter.start()
# We're running!
for counter in counters:
counter.join()
print count
return 0
if __name__ == '__main__':
main()
注意:我留下了acquire()
和release()
次来评论来检查差异。我尝试在增量之后添加小sleep
s来调整线程的速度 - 没有区别
解决方案/测试:谢谢Kevin(见下面接受的答案)。我只是测试更改循环变量并获得了这个:
Loops Result
20 99% of the time 80. Sometimes 60.
200 99% of the time 800. Sometimes 600.
2000 Maybe 10% of the time different value
20000 Finally... random numbers! I've yet to see 80000 or 60000.
All numbers are now random, as originally expected.
我怀疑这似乎意味着线程开销是10 ^ 4增量操作的顺序。
另一个有趣的测试(在我看来,至少):
我在增量后添加time.sleep(random.random()/divisor)
并找到,循环计数再次为20:
divisor result
100 always 4, so the race condition is always there.
1000 95% of the time 4, sometimes 3 or 5 (once 7)
10000 99% of the time NOT 4, varying from 4 to 13
100000 basically same as 10000
1000000 varying from 10 to 70
10000000... same as previous... (even with time.sleep(0))
答案 0 :(得分:5)
如果增加每线程的迭代次数:
def run(self):
global count
for i in range(100000):
#self.sem.acquire()
count = count + 1
#self.sem.release()
然后确实发生了竞争条件。您的脚本打印例如175165,预计会有400000。这表明递增不是原子的。
增量的其他证据不是原子的:CPython中线程的行为是由Global Interpreter Lock强制执行的。根据维基,
全局解释器锁(GIL)是一个互斥锁,它可以防止多个本机线程同时执行Python字节码。
如果GIL具有字节码级粒度,那么我们期望递增不是原子的,因为它需要多个字节码才能执行,如dis
模块所示:
>>> import dis
>>> def f():
... x = 0
... x = x + 1
...
>>> dis.dis(f)
2 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (x)
3 6 LOAD_FAST 0 (x)
9 LOAD_CONST 2 (1)
12 BINARY_ADD
13 STORE_FAST 0 (x)
16 LOAD_CONST 0 (None)
19 RETURN_VALUE
这里,递增的动作由字节码6到13执行。
那么为什么原始代码没有表现出竞争条件呢?这似乎是由于每个线程的预期寿命短 - 通过循环仅20次,每个线程将完成其工作并在下一个线程开始自己的工作之前死亡。
答案 1 :(得分:2)
在Cpython中,线程安全性由原子性决定(单个字节码不会中断),GIL(python锁定单个线程大约100个“滴答”)和运气。反编译一个更简单的函数,
>>> import dis
>>> count = 0
>>> def x():
... count = count + 1
...
>>> dis.dis(x)
2 0 LOAD_FAST 0 (count)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 STORE_FAST 0 (count)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
我们看到代码可以在加载和存储之间中断。这可能意味着一个线程加载一个值,被挂起并最终用其结果覆盖一个更大的值。
现在好运发挥了作用。手术20次并不多。让我们更改您的代码,将计数作为参数,看看更大的值会发生什么
import threading
import time
import sys
turnstile_names = ["N", "E", "S", "W"]
count = 0
class Counter(threading.Thread):
def __init__(self, id):
threading.Thread.__init__(self)
self.id = id
def run(self):
global count
for i in range(int(sys.argv[1])):
#self.sem.acquire()
count = count + 1
#self.sem.release()
def main():
sem = threading.Semaphore(1)
counters = [Counter(name) for name in turnstile_names]
for counter in counters:
counter.start()
# We're running!
for counter in counters:
counter.join()
print count
return 0
if __name__ == '__main__':
main()
我得到了一次:
td@timsworld2:~/tmp/so$ python count.py 1
4
td@timsworld2:~/tmp/so$ python count.py 2
8
td@timsworld2:~/tmp/so$ python count.py 20
80
td@timsworld2:~/tmp/so$ python count.py 200
749
td@timsworld2:~/tmp/so$ python count.py 2000
4314
2000个4线程中,我已经失去了近一半的值。