我在IBM网站上使用以下示例。我注意到由于while循环,DatamineThread()和ThreadUrl()的线程保持打开状态。
我正在尝试终止这些线程并打印文本告诉我。我不确定我是否以正确的方式进行此操作,或者即使线程需要以这种方式终止。问题是当我在main()中设置run = False时,while循环正在读取run = True。
任何帮助都会很棒......谢谢
import Queue
import threading
import urllib2
import time
from BeautifulSoup import BeautifulSoup
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
out_queue = Queue.Queue()
run = True
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue, out_queue):
threading.Thread.__init__(self)
self.queue = queue
self.out_queue = out_queue
def run(self):
global run
while run:
#grabs host from queue
host = self.queue.get()
#grabs urls of hosts and then grabs chunk of webpage
url = urllib2.urlopen(host)
chunk = url.read()
#place chunk into out queue
self.out_queue.put(chunk)
#signals to queue job is done
self.queue.task_done()
print 'ThreadUrl finished...'
class DatamineThread(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, out_queue):
threading.Thread.__init__(self)
self.out_queue = out_queue
def run(self):
global run
while run:
#grabs host from queue
chunk = self.out_queue.get()
#parse the chunk
soup = BeautifulSoup(chunk)
print soup.findAll(['title'])
#signals to queue job is done
self.out_queue.task_done()
print 'DatamineThread finished...'
start = time.time()
def main():
global run
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue, out_queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
for i in range(5):
dt = DatamineThread(out_queue)
dt.setDaemon(True)
dt.start()
#wait on the queue until everything has been processed
queue.join()
out_queue.join()
# try and break while-loops in threads
run = False
time.sleep(5)
main()
print "Elapsed Time: %s" % (time.time() - start)
答案 0 :(得分:4)
我个人并不是线程条件全局变量的忠实粉丝,很大程度上是因为我已经看到过你之前遇到过的事情。原因来自Queue.get的python文档。
如果可选的args块为true且timeout为None(默认值),则在必要时阻止,直到某个项可用为止。
基本上,您永远不会看到针对while run:
的第二次检查,因为out_queue.get()
在队列清空后无限期阻止。
更好的方法,恕我直言,要么在队列中使用sentinel值,要么使用get_nowait并捕获异常以打破循环。例子:
class DatamineThread(threading.Thread):
def run(self):
while True:
data = self.out_queue.get()
if data == "time to quit": break
# non-sentinel processing here.
class DatamineThread(threading.Thread):
def run(self):
while True:
try:
data = self.out_queue.get_nowait() # also, out_queue.get(False)
except Queue.Empty: break
# data processing here.
为了确保所有线程结束,可以通过以下几种方式完成:
for i in range(numWorkers):
out_queue.put('time to quit')
out_queue.join()
class DatamineThread(threading.Thread):
def run(self):
while True:
data = self.out_queue.get()
if data == "time to quit":
self.out_queue.put('time to quit')
break
# non-sentinel processing here.
无论哪种方式都应该有效。哪个更受欢迎取决于如何填充out_queue。如果可以通过工作线程添加/删除它,则第一种方法更可取。致电join()
,然后添加标记,然后再次拨打join()
。第二种方法是好的,如果你不想记住你创建了多少个工作线程 - 它只使用一个标记值并且不会使队列混乱。