Question

这是基于此网站上的另一个问题：What's the best way to download file using urllib3 但是，我不能在那里评论所以我问另一个问题：

如何使用urllib3下载（更大）文件？

我尝试使用与urllib2（Download file from web in Python 3）相同的代码，但它与urllib3失败：

http = urllib3.PoolManager()

with http.request('GET', url) as r, open(path, 'wb') as out_file:       
    #shutil.copyfileobj(r.data, out_file) # this writes a zero file
    shutil.copyfileobj(r.data, out_file)

这表示＆＃39;字节＆＃39;对象没有属性＆＃39;读＆＃39;

然后我尝试在该问题中使用代码，但它陷入了无限循环，因为数据始终是＆＃39; 0＆＃39;：

http = urllib3.PoolManager()
r = http.request('GET', url)

with open(path, 'wb') as out:
    while True:
        data = r.read(4096)         
        if data is None:
            break
        out.write(data)
r.release_conn()

但是，如果我读取内存中的所有内容，则会正确下载文件：

http = urllib3.PoolManager()
r = http.request('GET', url)
with open(path, 'wb') as out:
    out.write(data)

我不想这样做，因为我可能会下载非常大的文件。遗憾的是，urllib文档未涵盖本主题中的最佳实践。

（另外，请不要建议请求或urllib2，因为它们在自签名证书方面不够灵活。）

Answer 1

你非常接近，缺少的部分是设置preload_content=False（这将是即将发布的版本中的默认设置）。此外，您可以将响应视为类文件对象，而不是.data属性（这是一个有希望有一天会被弃用的魔法属性）。

- with http.request('GET', url) ...
+ with http.request('GET', url, preload_content=False) ...

此代码应该有效：

http = urllib3.PoolManager()

with http.request('GET', url, preload_content=False) as r, open(path, 'wb') as out_file:       
    shutil.copyfileobj(r, out_file)

urllib3的响应对象也尊重io interface，所以你也可以这样做......

import io
response = http.request(..., preload_content=False)
buffered_response = io.BufferedReader(response, 2048)

只要您在三次尝试中添加preload_content=False并将响应视为类似文件的对象，它们都可以正常工作。

遗憾的是，urllib文档未涵盖本主题中的最佳实践。

您完全正确，我希望您可以考虑通过在此处发送拉取请求来帮助我们记录此用例：https://github.com/shazow/urllib3

如何使用urllib3下载文件？

1 个答案: