Question

我正在Pyglet编写一个GUI应用程序，它必须显示来自Internet的数十到数百个缩略图。现在，我正在使用urllib.urlretrieve抓住它们，但每次都会阻塞它们，直到它们完成，并且每次只抓取一个。

我更愿意并行下载它们，并在完成后立即显示它们，而不会在任何时候阻止GUI。这样做的最佳方式是什么？

我对线程知之甚少，但看起来threading模块可能有帮助吗？或者也许有一些我忽略的简单方法。

Answer 1

您可能会受益于threading或multiprocessing模块。您实际上不需要自己创建所有基于Thread的类，使用Pool.map的方法更简单：

from multiprocessing import Pool

def fetch_url(url):
    # Fetch the URL contents and save it anywhere you need and
    # return something meaningful (like filename or error code),
    # if you wish.
    ...

pool = Pool(processes=4)
result = pool.map(f, image_url_list)

Answer 2

如您所料，这是线程的完美情况。 Here是一个简短的指南，我在python中进行第一次线程处理时发现非常有帮助。

Answer 3

正如您所指出的，您可以创建多个线程，每个线程都负责执行urlretrieve操作。这允许主线程不间断地继续。

这是python中的线程教程： http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf

Answer 4

以下是如何使用threading.Thread的示例。只需用您自己的类名替换类名和自己的run函数。请注意，线程对于像您这样的IO受限应用程序非常有用，并且可以真正加快速度。在标准python中严格使用pythong线程进行计算没有用，因为一次只能计算一个线程。

import threading, time
class Ping(threading.Thread):
    def __init__(self, multiple):
        threading.Thread.__init__(self)
        self.multiple = multiple
    def run(self):
        #sleeps 3 seconds then prints 'pong' x times
        time.sleep(3)
        printString = 'pong' * self.multiple

pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now

Answer 5

您有以下选择：

线程：最简单但不能很好地扩展
扭曲：中等难度，可以很好地扩展，但由于GIL和单线程共享CPU。
多处理：最难处理。如果您知道如何编写自己的事件循环，则可以很好地扩展。

我建议只使用线程，除非你需要一个工业规模的抓取器。

Answer 6

您需要使用线程或异步网络库，例如Twisted。我怀疑在你的特定用例中使用线程可能更简单。

如何在Python中执行非阻塞URL提取

6 个答案: