Python中的线程 - 我缺少什么?

时间:2015-04-14 15:22:36

标签: python multithreading

这是我第一次尝试使用Python中的线程...而且它失败了:)我想实现一个基本的临界区问题,并发现这段代码实际上并没有出现问题。

问题:为什么我的计数器增量有问题?运行后计数器不应该有随机值吗?如果递增已经原子执行,或者如果线程不是并发的话,我只能解释这个......

import threading
import time

turnstile_names = ["N", "E", "S", "W"]
count = 0

class Counter(threading.Thread):
    def __init__(self, id):
        threading.Thread.__init__(self)
        self.id = id

    def run(self):
        global count
        for i in range(20):
            #self.sem.acquire()
            count = count + 1
            #self.sem.release()

def main():
    sem = threading.Semaphore(1)

    counters = [Counter(name) for name in turnstile_names]

    for counter in counters:
        counter.start()

    # We're running!

    for counter in counters:
        counter.join()

    print count
    return 0

if __name__ == '__main__':
    main()

注意:我留下了acquire()release()次来评论来检查差异。我尝试在增量之后添加小sleep s来调整线程的速度 - 没有区别

解决方案/测试:谢谢Kevin(见下面接受的答案)。我只是测试更改循环变量并获得了这个:

Loops    Result
20       99% of the time 80. Sometimes 60.
200      99% of the time 800. Sometimes 600.
2000     Maybe 10% of the time different value
20000    Finally... random numbers! I've yet to see 80000 or 60000.
         All numbers are now random, as originally expected.

我怀疑这似乎意味着线程开销是10 ^ 4增量操作的顺序。

另一个有趣的测试(在我看来,至少):

我在增量后添加time.sleep(random.random()/divisor)并找到,循环计数再次为20:

divisor     result
100         always 4, so the race condition is always there.
1000        95% of the time 4, sometimes 3 or 5 (once 7)
10000       99% of the time NOT 4, varying from 4 to 13
100000      basically same as 10000
1000000     varying from 10 to 70
10000000... same as previous... (even with time.sleep(0))

2 个答案:

答案 0 :(得分:5)

如果增加每线程的迭代次数:

def run(self):
    global count
    for i in range(100000):
        #self.sem.acquire()
        count = count + 1
        #self.sem.release()

然后确实发生了竞争条件。您的脚本打印例如175165,预计会有400000。这表明递增不是原子的。


增量的其他证据不是原子的:CPython中线程的行为是由Global Interpreter Lock强制执行的。根据维基,

  

全局解释器锁(GIL)是一个互斥锁,它可以防止多个本机线程同时执行Python字节码。

如果GIL具有字节码级粒度,那么我们期望递增不是原子的,因为它需要多个字节码才能执行,如dis模块所示:

>>> import dis
>>> def f():
...     x = 0
...     x = x + 1
...
>>> dis.dis(f)
  2           0 LOAD_CONST               1 (0)
              3 STORE_FAST               0 (x)

  3           6 LOAD_FAST                0 (x)
              9 LOAD_CONST               2 (1)
             12 BINARY_ADD
             13 STORE_FAST               0 (x)
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE

这里,递增的动作由字节码6到13执行。


那么为什么原始代码没有表现出竞争条件呢?这似乎是由于每个线程的预期寿命短 - 通过循环仅20次,每个线程将完成其工作并在下一个线程开始自己的工作之前死亡。

答案 1 :(得分:2)

在Cpython中,线程安全性由原子性决定(单个字节码不会中断),GIL(python锁定单个线程大约100个“滴答”)和运气。反编译一个更简单的函数,

>>> import dis
>>> count = 0
>>> def x():
...     count = count + 1
... 
>>> dis.dis(x)
  2           0 LOAD_FAST                0 (count)
              3 LOAD_CONST               1 (1)
              6 BINARY_ADD          
              7 STORE_FAST               0 (count)
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

我们看到代码可以在加载和存储之间中断。这可能意味着一个线程加载一个值,被挂起并最终用其结果覆盖一个更大的值。

现在好运发挥了作用。手术20次并不多。让我们更改您的代码,将计数作为参数,看看更大的值会发生什么

import threading
import time
import sys

turnstile_names = ["N", "E", "S", "W"]
count = 0

class Counter(threading.Thread):
    def __init__(self, id):
        threading.Thread.__init__(self)
        self.id = id

    def run(self):
        global count
        for i in range(int(sys.argv[1])):
            #self.sem.acquire()
            count = count + 1
            #self.sem.release()

def main():
    sem = threading.Semaphore(1)

    counters = [Counter(name) for name in turnstile_names]

    for counter in counters:
        counter.start()

    # We're running!

    for counter in counters:
        counter.join()

    print count
    return 0

if __name__ == '__main__':
    main()

我得到了一次:

td@timsworld2:~/tmp/so$ python count.py 1
4
td@timsworld2:~/tmp/so$ python count.py 2
8
td@timsworld2:~/tmp/so$ python count.py 20
80
td@timsworld2:~/tmp/so$ python count.py 200
749
td@timsworld2:~/tmp/so$ python count.py 2000
4314

2000个4线程中,我已经失去了近一半的值。