根据the documentation,应该可以通过向Request提供类似文件的对象而不是文件的内容来进行非内存密集型的上载。好的,我在代码中执行此操作:
files = {'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream', {'Content-Transfer-Encoding':'binary'})}
r = s.post(url, data=content, params=params, files=files, headers=headers)
看着它在我的计算机上运行,使用2.8 GB的文件,它开始以惊人的速度消耗内存,然后在它达到大约89%的内存时挽救它。然后失败并显示以下输出:
File "***.py", line 644, in post
r = s.post(url, data=content, params=params, files=files, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
return self.request('POST', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 110, in request
hooks, stream, verify, cert)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 348, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 286, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 289, in prepare
self.prepare_body(data, files)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 426, in prepare_body
(body, content_type) = self._encode_files(files, data)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 144, in _encode_files
body, content_type = encode_multipart_formdata(new_fields)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 101, in encode_multipart_formdata
return body.getvalue(), content_type
MemoryError
对于较小的文件,它可以正常工作,但在这样做时仍会占用大量内存。我误解了什么吗?
在看到Martijn Pieters'answer之后,我将代码更改为:
files = {'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream')}
m = requests_toolbelt.MultipartEncoder(fields=files)
headers['content-type'] = m.content_type
r = s.post(url, data=m, params=params, headers=headers)
我必须删除{'Content-Transfer-Encoding':'binary'}
,因为它似乎不受支持,并导致出现此错误消息:
File "***.py", line 647, in post
m = requests_toolbelt.MultipartEncoder(fields=files)
File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 89, in __init__
self._prepare_parts()
File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 171, in _prepare_parts
self.parts = [Part.from_field(f, enc) for f in fields]
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 44, in iter_field_objects
yield RequestField.from_tuples(*field)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/fields.py", line 97, in from_tuples
filename, data = value
ValueError: too many values to unpack
(有没有办法在使用多部分编码器时仍然设置此标题?我更喜欢它在那里。)
但是,即使删除了该标头,它的仍然无法正常工作,因为现在我收到此错误消息:
File "***.py", line 647, in post
r = s.post(url, data=m, params=params, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
return self.request('POST', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 114, in request
main_key = self.cache.create_key(response.request)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 156, in create_key
key.update(_to_bytes(request.body))
TypeError: must be convertible to a buffer, not MultipartEncoder
有什么想法吗?我承认我对此很陌生,并且这些错误信息正如编程中常常的那样,并不乐于帮助。
答案 0 :(得分:5)
您没有流式上传,因为requests
只能在全身来自打开的文件对象时执行此操作。它仍会将所有文件读入内存以构建多部分帖子。
对于多部分上传,请使用requests toolbelt;它包含Streaming Multipart Data Encoder:
from requests_toolbelt import MultipartEncoder
import requests
files = {
'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream')
}
m = MultipartEncoder(fields=dict(files, **params))
headers['content-type'] = m.content_type
r = s.post(url, data=m, headers=headers)
r = requests.post('http://httpbin.org/post', data=m, headers=headers)
使用MultipartEncoder
库中的iter_field_objects()
function解析urllib3
的第一个参数;这意味着它可以是键值对的字典,或 RequestField()
objects的序列(列表,元组)。
当像上面一样传入字典时,每个键值对都使用RequestField.from_tuples()
进行解析,并且您只能指定字段名称,值,以及可选的文件名和mimetype。不支持额外标头。我在上面的示例中使用了该选项。
如果您要将Content-Transfer-Encoding
标头添加到file
字段,那么我们需要使用一系列RequestField
个对象:
from requests.packages.urllib3.fields import RequestField
fields = [RequestField.from_tuples(*p) for p in params.iteritems()]
fields.extend([
RequestField('md5', md5hash),
RequestField('modified', now),
RequestField('created', now),
RequestField(
'file', fileobject, 'application/octet-stream',
{'Content-Transfer-Encoding':'binary'}),
])
请注意,您无法将流式传输请求与request-cache project结合使用;后者需要访问请求的全部内容才能生成缓存密钥。
您必须修补requests_cache.backends.base.BaseCache.create_key
方法以处理MultipartEncoder
个对象,并为身体提供某种哈希键。然而,这超出了这个问题的范围。
答案 1 :(得分:1)
上传文件的简单方法
打开('庞大的身体',' rb')作为f: requests.post(' http://some.url/streamed',data = f)
帮助