Question

我正在开发一个应用程序来从互联网上获取所有类型的东西。我希望不要为此编写RegExp模式，因此，我如何解析Content-Type标题中的值：例如：

text/html; charset=UTF-8

为了给出上下文，这是我在互联网上获取内容的代码：

from requests import head

foo = head("http://www.example.com")

*编辑*

我期待的输出与mimetools中的方法类似。例如：

x = magic("text/html; charset=UTF-8")

将输出：

x.getparam('charset')  # UTF-8
x.getmaintype()  # text
x.getsubtype()  # html

Answer 1

不幸的是，

requests没有给你一个解析内容类型的界面，而且这个东西上的标准库有点混乱。所以我看到两个选择：

选项1 ：使用python-mimeparse第三方库。

选项2 ：要将mime类型与charset之类的选项分开，您可以使用requests用于在内部解析类型/编码的相同技术：use {{ 1}}。

cgi.parse_header

其余部分应该足够简单，可以处理response = requests.head('http://example.com') mimetype, options = cgi.parse_header(response.headers['Content-Type'])：

split

Answer 2

你的问题有点不清楚。我假设您正在使用某种Web应用程序框架，例如Django或Flask？

以下是使用Flask阅读Content-Type的示例：

from flask import Flask, request
app = Flask(__name__)

@app.route("/")
def test():
  request.headers.get('Content-Type')


if __name__ == "__main__":
  app.run()

Answer 3

您的回复（foo）会有一个包含标题的字典。尝试类似：

foo.headers.get('content-type')

或打印foo.headers以查看所有标题。

如何从HTTP头响应中解析Content-Type的值？

3 个答案: