Question

首先我要说的是，我正在使用twisted.web框架。 Twisted.web的文件上传不像我想要的那样（它只包含文件数据，而不包含任何其他信息），cgi.parse_multipart不能像我想要的那样工作（同样的事情），twisted.web使用此函数），cgi.FieldStorage不起作用（因为我通过扭曲而不是CGI接口获取POST数据 - 据我所知，{{1}尝试通过stdin获取请求），FieldStorage对我不起作用，因为使用twisted.web2使我感到困惑和激怒（对我想要的东西太复杂了）。

话虽这么说，我决定尝试自己解析HTTP请求。

使用Chrome，HTTP请求形成如下：

Deferred

它总是如何形成？我用正则表达式解析它，就像这样（原谅代码墙）：

（注意，我剪掉了大部分代码，只显示我认为相关的内容（正则表达式（是的，嵌套括号），这是一个------WebKitFormBoundary7fouZ8mEjlCe92pq Content-Disposition: form-data; name="upload_file_nonce" 11b03b61-9252-11df-a357-00266c608adb ------WebKitFormBoundary7fouZ8mEjlCe92pq Content-Disposition: form-data; name="file"; filename="login.html" Content-Type: text/html <!DOCTYPE html> <html> <head> ... ------WebKitFormBoundary7fouZ8mEjlCe92pq Content-Disposition: form-data; name="file"; filename="" ------WebKitFormBoundary7fouZ8mEjlCe92pq--方法（目前唯一的方法）我构建的__init__类。完整的代码可以在修订历史中看到（我希望我没有错配任何括号）

Uploads

你可以看到我在到达边界时开始一个新的“文件”dict。我将if line == "--{0}--".format(boundary): finished = True if in_header == True and not line: in_header = False if 'type' not in current_file: ignore_current_file = True if in_header == True: m = re.match( "Content-Disposition: form-data; name=\"(.*?)\"; filename=\"(.*?)\"$", line) if m: input_name, current_file['filename'] = m.group(1), m.group(2) m = re.match("Content-Type: (.*)$", line) if m: current_file['type'] = m.group(1) else: if 'data' not in current_file: current_file['data'] = line else: current_file['data'] += line设置为in_header，表示我正在解析标头。当我到达空行时，我会将其切换为True - 但在检查是否为该表单值设置了False之前没有 - 如果没有，我设置Content-Type，因为我' m只寻找文件上传。

我知道我应该使用一个库，但是我厌倦了阅读文档，试图让不同的解决方案在我的项目中工作，并且仍然让代码看起来合理。我只是想通过这一部分 - 如果使用文件上传解析HTTP POST就这么简单，那么我将坚持下去。

注意：此代码目前运行良好，我只是想知道它是否会阻塞/吐出某些浏览器的请求。

Answer 1

我对此问题的解决方案是使用cgi.FieldStorage解析内容，如：

class Root(Resource):

def render_POST(self, request):

    self.headers = request.getAllHeaders()
    # For the parsing part look at [PyMOTW by Doug Hellmann][1]
    img = cgi.FieldStorage(
        fp = request.content,
        headers = self.headers,
        environ = {'REQUEST_METHOD':'POST',
                 'CONTENT_TYPE': self.headers['content-type'],
                 }
    )

    print img["upl_file"].name, img["upl_file"].filename,
    print img["upl_file"].type, img["upl_file"].type
    out = open(img["upl_file"].filename, 'wb')
    out.write(img["upl_file"].value)
    out.close()
    request.redirect('/tests')
    return ''

Answer 2

content-disposition标头没有已定义的字段顺序，而且它可能包含的字段多于文件名。所以你的文件名匹配可能会失败 - 甚至可能没有文件名！

请参阅rfc2183（编辑用于邮件，请参阅rfc1806，rfc2616以及更多可能用于http）

此外，我建议在这些regexp中用\ s *替换每个空格，而不是依赖于字符大小写。

Answer 3

你试图避免阅读文档，但我认为最好的建议是实际阅读：

rfc 2388 Returning Values from Forms: multipart/form-data
rfc 1867 Form-based File Upload in HTML

确保您不会遗漏任何案件。更简单的方法可能是使用poster库。

我正确地解析了这个HTTP POST请求吗？

3 个答案: