Question

我目前正在使用Python requests来处理HTTP请求，但由于API的限制，我无法继续使用该库。

我需要一个库，它允许我以流文件的方式编写请求体，因为我将要发送的数据不会立即全部可用，而且我想保存尽可能多的在发出请求时尽可能使用内存。是否有一个易于使用的库，它允许我发送这样的PUT请求：

request = HTTPRequest()
request.headers['content-type'] = 'application/octet-stream'
# etc
request.connect()

# send body
with open('myfile', 'rb') as f:
    while True:
        chunk = f.read(64 * 1024)
        request.body.write(chunk)
        if not len(chunk) == 64 * 1024:
            break

# finish
request.close()

更具体地说，我有一个线程可以使用。使用此线程，当我通过网络收到流时，我会收到回调。从本质上讲，这些回调看起来像这样：

class MyListener(Listener):
    def on_stream_start(stream_name):
        pass

    def on_stream_chunk(chunk):
        pass

    def on_stream_end(total_size):
        pass

我需要在on_stream_start方法中创建上传请求，在on_stream_chunk方法中上传块，然后在on_stream_end方法中完成上传。因此，我需要一个支持像write(chunk)这样的方法的库，以便能够执行类似于以下的操作：

class MyListener(Listener):
    request = None

    def on_stream_start(stream_name):
        request = RequestObject(get_url(), "PUT")
        request.headers.content_type = "application/octet-stream"
        # ...

    def on_stream_chunk(chunk):
        request.write_body(chunk + sha256(chunk).hexdigest())

    def on_stream_end(total_size):
        request.close()

requests库支持读取的文件类对象和生成器，但写输出请求：pull而不是push。是否有一个库可以让我将数据上传到服务器？

Answer 1

据我所知，httplib的{{3}}完全符合您的要求。

我跟踪了实际发送的函数，只要你传递一个类似文件的对象（而不是字符串），就会把它整理好：

Definition: httplib.HTTPConnection.send(self, data)
Source:

def send(self, data):
    """Send `data' to the server."""
    if self.sock is None:
        if self.auto_open:
            self.connect()
        else:
            raise NotConnected()

    if self.debuglevel > 0:
        print "send:", repr(data)
    blocksize = 8192
    if hasattr(data,'read') and not isinstance(data, array):
        if self.debuglevel > 0: print "sendIng a read()able"

        ## {{{ HERE IS THE CHUCKING LOGIC
        datablock = data.read(blocksize)
        while datablock:
            self.sock.sendall(datablock)
            datablock = data.read(blocksize)
        ## }}}

    else:
        self.sock.sendall(data)

Answer 2

我在代码库中的一些地方做了类似的事情。你需要一个上传文件包装器，你需要另一个线程或一个greenthread - 我在我的实例中使用eventlet伪造线程。调用requests.put，它会在类文件对象包装器上阻塞read()。您调用put的线程将阻止等待，因此您需要在另一个线程中进行接收。

很抱歉没有发布代码，我刚看到这个，当时我正在努力。我希望这足以帮助，如果不是，我可以稍后编辑和添加更多内容。

Answer 3

请求实际上支持具有files参数的多部分编码请求：

Multipart POST example in the official documentation：

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}

r = requests.post(url, files=files)
r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

如果您愿意，也可以创建自己的类文件流媒体对象，但不能在同一请求中混合使用流和文件。

一个可能对你有用的简单案例是打开文件并返回一个基于生成器的分块阅读器：

def read_as_gen(filename, chunksize=-1): # -1 defaults to read the file to the end, like a regular .read()
    with open(filename, mode='rb') as f:
        while True:
            chunk = f.read(chunksize)
            if len(chunk) > 0:
                yield chunk
            else:
                raise StopIteration

# Now that we can read the file as a generator with a chunksize, give it to the files parameter
files = {'file': read_as_gen(filename, 64*1024)}

# ... post as normal.

但是如果你不得不阻止其他东西上的分块，比如另一个网络缓冲区，你可以用同样的方式处理它：

def read_buffer_as_gen(buffer_params, chunksize=-1): # -1 defaults to read the file to the end, like a regular .read()
    with buffer_open(*buffer_params) as buf: # some function to open up your buffer
    # you could also just pass in the buffer itself and skip the `with` block
        while True:
            chunk = buf.read(chunksize)
            if len(chunk) > 0:
                yield chunk
            else:
                raise StopIteration

Answer 4

这可能会有所帮助

import urllib2

request = urllib2.Request(uri, data=data)
request.get_method = lambda: 'PUT' # or 'DELETE'
response = urllib2.urlopen(request)

用于在类文件对象中发送HTTP请求的库

4 个答案: