Question

我想使用HTTP通过urllib3协议下载文件。我已设法使用以下代码执行此操作：

 url = 'http://url_to_a_file'
 connection_pool = urllib3.PoolManager()
 resp = connection_pool.request('GET',url )
 f = open(filename, 'wb')
 f.write(resp.data)
 f.close()
 resp.release_conn()

但我想知道正确这样做的方式是什么。例如，它适用于大文件，如果没有做什么，使这个代码更容易容忍和可扩展。

请注意。例如，使用urllib3库而不是urllib2对我来说很重要，因为我希望我的代码是线程安全的。

Answer 1

您的代码段已关闭。值得注意的两件事：

如果您使用resp.data，它将使用整个响应并返回连接（您不需要手动resp.release_conn()）。如果您将数据保存在内存中，那么这很好。
您可以使用resp.read(amt)来传输响应，但需要通过resp.release_conn()返回连接。

这看起来像......

import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)

with open(path, 'wb') as out:
    while True:
        data = r.read(chunk_size)
        if not data:
            break
        out.write(data)

r.release_conn()

在这种情况下，文档可能有点缺乏。如果有人有兴趣制作pull-request to improve the urllib3 documentation，那将非常感激。：）

Answer 2

最正确的方法是获取一个类似文件的对象来表示HTTP响应并使用shutil.copyfileobj将其复制到一个真实的文件中，如下所示：

url = 'http://url_to_a_file'
c = urllib3.PoolManager()

with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file:
    shutil.copyfileobj(resp, out_file)

resp.release_conn()     # not 100% sure this is required though

Answer 3

使用urllib3最简单的方法，您可以使用shutil做自动管理软件包。

import urllib3
import shutil

http = urllib3.PoolManager()
with open(filename, 'wb') as out:
    r = http.request('GET', url, preload_content=False)
    shutil.copyfileobj(r, out)

使用urllib3下载文件的最佳方法是什么

3 个答案: