Python请求 - MemoryError尽管使用流式上传

时间:2015-01-28 17:05:08

标签: python python-requests

根据the documentation,应该可以通过向Request提供类似文件的对象而不是文件的内容来进行非内存密集型的上载。好的,我在代码中执行此操作:

files = {'md5': ('', md5hash),
         'modified': ('', now),
         'created': ('', now),
         'file': (os.path.basename(url), fileobject, 'application/octet-stream', {'Content-Transfer-Encoding':'binary'})}
r = s.post(url, data=content, params=params, files=files, headers=headers)

看着它在我的计算机上运行,​​使用2.8 GB的文件,它开始以惊人的速度消耗内存,然后在它达到大约89%的内存时挽救它。然后失败并显示以下输出:

  File "***.py", line 644, in post
    r = s.post(url, data=content, params=params, files=files, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 110, in request
    hooks, stream, verify, cert)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 348, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 286, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 289, in prepare
    self.prepare_body(data, files)
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 426, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 144, in _encode_files
    body, content_type = encode_multipart_formdata(new_fields)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 101, in encode_multipart_formdata
    return body.getvalue(), content_type
MemoryError

对于较小的文件,它可以正常工作,但在这样做时仍会占用大量内存。我误解了什么吗?

编辑:

在看到Martijn Pieters'answer之后,我将代码更改为:

    files = {'md5': ('', md5hash),
             'modified': ('', now),
             'created': ('', now),
             'file': (os.path.basename(url), fileobject, 'application/octet-stream')}
    m = requests_toolbelt.MultipartEncoder(fields=files)
    headers['content-type'] = m.content_type
    r = s.post(url, data=m, params=params, headers=headers)

我必须删除{'Content-Transfer-Encoding':'binary'},因为它似乎不受支持,并导致出现此错误消息:

  File "***.py", line 647, in post
    m = requests_toolbelt.MultipartEncoder(fields=files)
  File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 89, in __init__
    self._prepare_parts()
  File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 171, in _prepare_parts
    self.parts = [Part.from_field(f, enc) for f in fields]
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 44, in iter_field_objects
    yield RequestField.from_tuples(*field)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/fields.py", line 97, in from_tuples
filename, data = value
ValueError: too many values to unpack

(有没有办法在使用多部分编码器时仍然设置此标题?我更喜欢它在那里。)

但是,即使删除了该标头,它的仍然无法正常工作,因为现在我收到此错误消息:

  File "***.py", line 647, in post
    r = s.post(url, data=m, params=params, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 114, in request
    main_key = self.cache.create_key(response.request)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 156, in create_key
    key.update(_to_bytes(request.body))
TypeError: must be convertible to a buffer, not MultipartEncoder

有什么想法吗?我承认我对此很陌生,并且这些错误信息正如编程中常常的那样,并不乐于帮助。

2 个答案:

答案 0 :(得分:5)

您没有流式上传,因为requests只能在全身来自打开的文件对象时执行此操作。它仍会将所有文件读入内存以构建多部分帖子。

对于多部分上传,请使用requests toolbelt;它包含Streaming Multipart Data Encoder

from requests_toolbelt import MultipartEncoder
import requests

files = {
    'md5': ('', md5hash),
    'modified': ('', now),
    'created': ('', now),
    'file': (os.path.basename(url), fileobject, 'application/octet-stream')
}
m = MultipartEncoder(fields=dict(files, **params))
headers['content-type'] = m.content_type

r = s.post(url, data=m, headers=headers)
r = requests.post('http://httpbin.org/post', data=m, headers=headers)

使用MultipartEncoder库中的iter_field_objects() function解析urllib3的第一个参数;这意味着它可以是键值对的字典, RequestField() objects的序列(列表,元组)。

当像上面一样传入字典时,每个键值对都使用RequestField.from_tuples()进行解析,并且您只能指定字段名称,值,以及可选的文件名和mimetype。不支持额外标头。我在上面的示例中使用了该选项。

如果您要将Content-Transfer-Encoding标头添加到file字段,那么我们需要使用一系列RequestField个对象:

from requests.packages.urllib3.fields import RequestField

fields = [RequestField.from_tuples(*p) for p in params.iteritems()]
fields.extend([
    RequestField('md5', md5hash),
    RequestField('modified', now),
    RequestField('created', now),
    RequestField(
        'file', fileobject, 'application/octet-stream',
        {'Content-Transfer-Encoding':'binary'}),
])

请注意,您无法将流式传输请求与request-cache project结合使用;后者需要访问请求的全部内容才能生成缓存密钥。

您必须修补requests_cache.backends.base.BaseCache.create_key方法以处理MultipartEncoder个对象,并为身体提供某种哈希键。然而,这超出了这个问题的范围。

答案 1 :(得分:1)

上传文件的简单方法

打开('庞大的身体',' rb')作为f:     requests.post(' http://some.url/streamed',data = f)

帮助