Question

我正在尝试了解如何使用线程，我在http://www.ibm.com/developerworks/aix/library/au-threadingpython/看到了这个很好的例子

      #!/usr/bin/env python
      import Queue
      import threading
      import urllib2
      import time

      hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
      "http://ibm.com", "http://apple.com"]

      queue = Queue.Queue()

      class ThreadUrl(threading.Thread):
      """Threaded Url Grab"""
        def __init__(self, queue):
          threading.Thread.__init__(self)
          self.queue = queue

        def run(self):
          while True:
            #grabs host from queue
            host = self.queue.get()

            #grabs urls of hosts and prints first 1024 bytes of page
            url = urllib2.urlopen(host)
            print url.read(1024)

            #signals to queue job is done
            self.queue.task_done()

      start = time.time()
      def main():

        #spawn a pool of threads, and pass them queue instance 
        for i in range(5):
          t = ThreadUrl(queue)
          t.setDaemon(True)
          t.start()

       #populate queue with data   
          for host in hosts:
            queue.put(host)

       #wait on the queue until everything has been processed     
       queue.join()

      main()
      print "Elapsed Time: %s" % (time.time() - start)

我不明白的部分是run方法有无限循环的原因：

        def run(self):
          while True:
            ... etc ...

只是为了笑，我没有循环运行程序，看起来它运行正常！那么有人可以解释为什么需要这个循环吗？另外，由于没有break语句，循环如何退出？

Answer 1

您希望线程执行多个作业吗？如果没有，您不需要循环。如果是这样，你需要一些能够做到这一点的东西。循环是一种常见的解决方案。您的示例数据包含五个作业，程序启动五个线程。所以你不需要任何线程在这里做多个工作。但是，请尝试向工作负载添加一个URL，并查看更改内容。

Answer 2

循环是必需的，因为没有它，每个工作线程一完成第一个任务就会终止。你想要的是让工人完成另一项任务。

在上面的代码中，您创建了5个工作线程，这恰好足以覆盖您正在使用的5个URL。如果你有> 5个URL，你会发现只有前5个被处理过。

为什么在Python中使用线程和队列时需要无限循环

2 个答案: