Question

我正在尝试创建python智能代理服务器，它应该能够将大量请求正文内容从客户端流式传输到某些内部存储（可能是亚马逊s3，swift，ftp或类似的东西）。在流式传输之前，服务器应该请求一些内部API服务器来确定上传到内部存储的参数。主要限制是它应该在一个HTTP操作中使用方法PUT完成。它也应该异步工作，因为会有很多文件上传。

什么解决方案允许我从上传内容中读取块并开始将这些块流式传输到内部存储器，因为用户将上传整个文件？在管理wsgi应用程序/ python Web服务器之前，我将收到所有我知道的等待整个内容的python Web应用程序。

我找到的解决方案之一是龙卷风叉https://github.com/nephics/tornado。但这是非官方的，龙卷风开发人员不急于将其纳入主要分支。您可能知道我的问题的一些现有解决方案？龙卷风？扭曲？ gevents？

Answer 1

以下是使用Twisted编写流式上传处理的服务器示例：

from twisted.internet import reactor
from twisted.internet.endpoints import serverFromString

from twisted.web.server import Request, Site
from twisted.web.resource import Resource

from twisted.application.service import Application
from twisted.application.internet import StreamServerEndpointService

# Define a Resource class that doesn't really care what requests are made of it.
# This simplifies things since it lets us mostly ignore Twisted Web's resource
# traversal features.
class StubResource(Resource):
    isLeaf = True

    def render(self, request):
        return b""

class StreamingRequestHandler(Request):
    def handleContentChunk(self, chunk):
        # `chunk` is part of the request body.
        # This method is called as the chunks are received.
        Request.handleContentChunk(self, chunk)
        # Unfortunately you have to use a private attribute to learn where
        # the content is being sent.
        path = self.channel._path

        print "Server received %d more bytes for %s" % (len(chunk), path)

class StreamingSite(Site):
    requestFactory = StreamingRequestHandler

application = Application("Streaming Upload Server")

factory = StreamingSite(StubResource())
endpoint = serverFromString(reactor, b"tcp:8080")
StreamServerEndpointService(endpoint, factory).setServiceParent(application)

这是一个tac文件（将其放在streamingserver.tac并运行twistd -ny streamingserver.tac）。

由于需要使用self.channel._path，这不是完全支持的方法。整体而言，API非常笨重，所以这更像是可能的例子而不是它的优点。长期以来一直有意让这类事情变得更容易（http://tm.tl/288），但这可能需要很长时间才能完成。

Answer 2

我似乎有一个使用gevent库和猴子补丁的解决方案：

from gevent.monkey import patch_all
patch_all()
from gevent.pywsgi import WSGIServer


def stream_to_internal_storage(data):
    pass


def simple_app(environ, start_response):
    bytes_to_read = 1024

    while True:
        readbuffer = environ["wsgi.input"].read(bytes_to_read)
        if not len(readbuffer) > 0:
            break
        stream_to_internal_storage(readbuffer)

    start_response("200 OK", [("Content-type", "text/html")])
    return ["hello world"]


def run():
    config = {'host': '127.0.0.1', 'port': 45000}

    server = WSGIServer((config['host'], config['port']), application=simple_app)
    server.serve_forever()


if __name__ == '__main__':
    run()

当我尝试上传大文件时效果很好：

curl -i -X PUT --progress-bar --verbose --data-binary @/path/to/huge/file "http://127.0.0.1:45000"

用于流请求正文内容的Python服务器

2 个答案: