Question

我使用请求从服务器下载文件（几千兆字节）。要提供进度更新（并防止整个文件必须存储在内存中），我已设置stream=True并将下载内容写入文件：

with open('output', 'w') as f:
    response = requests.get(url, stream=True)

    if not response.ok:
        print 'There was an error'
        exit()

    for block in response.iter_content(1024 * 100):
        f.write(block)
        completed_bytes += len(block)
        write_progress(completed_bytes, total_bytes)

但是，在下载中的某个随机点，请求会抛出ChunkedEncodingError。我已经进入源头并发现this corresponds to an IncompleteRead exception。我在这些行周围插入了一条日志语句，发现e.partial = "\r"。我知道服务器给下载的优先级低，我怀疑当服务器等待太长时间发送下一个块时会发生这种异常。

正如预期的那样，异常会停止下载。不幸的是，服务器没有实现HTTP / 1.1的内容范围，所以我不能简单地恢复它。我已经玩弄了urllib3的内部超时，但异常仍然存在。

有没有让底层urllib3（或Requests）更容忍这些空（或后期）块，以便文件可以完全下载？

Answer 1

import httplib

def patch_http_response_read(func):
    def inner(*args):
        try:
            return func(*args)
        except httplib.IncompleteRead, e:
            return e.partial
    return inner

httplib.HTTPResponse.read = patch_http_response_read(httplib.HTTPResponse.read)

我现在无法重现你的问题，但我认为这可能是一个补丁。它允许您处理有缺陷的http服务器。

大多数糟糕的服务器传输所有数据，但是由于实施错误，它们会错误地关闭会话，而httplib会引发错误并掩盖您宝贵的字节。

针对具有请求2.3.0的空块避免ChunkedEncodingError

1 个答案: