Python关闭多个线程

时间:2012-01-11 18:29:11

标签: python multithreading while-loop

我在IBM网站上使用以下示例。我注意到由于while循环,DatamineThread()和ThreadUrl()的线程保持打开状态。

我正在尝试终止这些线程并打印文本告诉我。我不确定我是否以正确的方式进行此操作,或者即使线程需要以这种方式终止。问题是当我在main()中设置run = False时,while循环正在读取run = True。

任何帮助都会很棒......谢谢

import Queue
import threading
import urllib2
import time
from BeautifulSoup import BeautifulSoup

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
        "http://ibm.com", "http://apple.com"]

queue = Queue.Queue()
out_queue = Queue.Queue()
run = True

class ThreadUrl(threading.Thread):
    """Threaded Url Grab"""
    def __init__(self, queue, out_queue):
        threading.Thread.__init__(self)
        self.queue = queue
        self.out_queue = out_queue

    def run(self):
        global run
        while run:
            #grabs host from queue
            host = self.queue.get()

            #grabs urls of hosts and then grabs chunk of webpage
            url = urllib2.urlopen(host)
            chunk = url.read()

            #place chunk into out queue
            self.out_queue.put(chunk)

            #signals to queue job is done
            self.queue.task_done()

        print 'ThreadUrl finished...'


class DatamineThread(threading.Thread):
    """Threaded Url Grab"""
    def __init__(self, out_queue):
        threading.Thread.__init__(self)
        self.out_queue = out_queue

    def run(self):
        global run
        while run:
            #grabs host from queue
            chunk = self.out_queue.get()

            #parse the chunk
            soup = BeautifulSoup(chunk)
            print soup.findAll(['title'])

            #signals to queue job is done
            self.out_queue.task_done()

        print 'DatamineThread finished...'

start = time.time()
def main():
    global run
    #spawn a pool of threads, and pass them queue instance
    for i in range(5):
        t = ThreadUrl(queue, out_queue)
        t.setDaemon(True)
        t.start()

    #populate queue with data
    for host in hosts:
        queue.put(host)

    for i in range(5):
        dt = DatamineThread(out_queue)
        dt.setDaemon(True)
        dt.start()


    #wait on the queue until everything has been processed
    queue.join()
    out_queue.join()

    # try and break while-loops in threads
    run = False

    time.sleep(5)


main()
print "Elapsed Time: %s" % (time.time() - start)

1 个答案:

答案 0 :(得分:4)

我个人并不是线程条件全局变量的忠实粉丝,很大程度上是因为我已经看到过你之前遇到过的事情。原因来自Queue.get的python文档。

  

如果可选的args块为true且timeout为None(默认值),则在必要时阻止,直到某个项可用为止。

基本上,您永远不会看到针对while run:的第二次检查,因为out_queue.get()在队列清空后无限期阻止。

更好的方法,恕我直言,要么在队列中使用sentinel值,要么使用get_nowait并捕获异常以打破循环。例子:

哨兵

class DatamineThread(threading.Thread):
    def run(self):
        while True:
            data = self.out_queue.get()
            if data == "time to quit": break
            # non-sentinel processing here.

尝试/除外

class DatamineThread(threading.Thread):
    def run(self):
        while True:
            try:
                data = self.out_queue.get_nowait() # also, out_queue.get(False)
            except Queue.Empty: break
            # data processing here.

为了确保所有线程结束,可以通过以下几种方式完成:

为每个工作人员添加哨兵

for i in range(numWorkers):
  out_queue.put('time to quit')

out_queue.join()

替换Sentinel

class DatamineThread(threading.Thread):
    def run(self):
        while True:
            data = self.out_queue.get()
            if data == "time to quit": 
                self.out_queue.put('time to quit')
                break
            # non-sentinel processing here.

无论哪种方式都应该有效。哪个更受欢迎取决于如何填充out_queue。如果可以通过工作线程添加/删除它,则第一种方法更可取。致电join(),然后添加标记,然后再次拨打join()。第二种方法是好的,如果你不想记住你创建了多少个工作线程 - 它只使用一个标记值并且不会使队列混乱。