如何在Python 3中解析原始HTTP请求?

时间:2016-08-22 23:52:14

标签: python python-3.x http

我正在寻找一种在Python 3中解析http请求的本地方法。

This question显示了在Python 2中实现它的方法,但现在使用已弃用的模块(和Python 2),我正在寻找一种在Python 3中实现它的方法。

我主要想知道请求的资源是什么,并从简单的请求中解析标头。 (即):

GET /index.html HTTP/1.1
Host: localhost
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

有人可以告诉我解析此请求的基本方法吗?

3 个答案:

答案 0 :(得分:2)

这些字段名称中的每一个都应该通过回车符然后换行符分隔,然后字段名称和值由冒号分隔。因此,假设您已经将响应作为字符串, 应该像

一样简单。

fields = resp.split("\r\n")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
    key,value = field.split(':')#split each line by http field name and value
    output[key] = value

答案 1 :(得分:1)

您可以使用标准库中email.message.Message模块的email类。

通过修改您链接的问题中的answer,下面是解析HTTP标头的Python3示例。

假设您要创建一个包含所有标题字段的字典:

import email
import pprint
from io import StringIO

request_string = 'GET / HTTP/1.1\r\nHost: localhost\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate, sdch\r\nAccept-Language: en-US,en;q=0.8'

# pop the first line so we only process headers
_, headers = request_string.split('\r\n', 1)

# construct a message from the request string
message = email.message_from_file(StringIO(headers))

# construct a dictionary containing the headers
headers = dict(message.items())

# pretty-print the dictionary of headers
pprint.pprint(headers, width=160)

如果你在python提示符下运行它,结果将如下所示:

{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
 'Accept-Encoding': 'gzip, deflate, sdch',
 'Accept-Language': 'en-US,en;q=0.8',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Host': 'localhost',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}

答案 2 :(得分:0)

它们是处理标头的另一种方式,更简单,更安全。更面向对象。 #61189692请参见Parse raw HTTP Headers