Question

我希望能够在我的Python代码中验证从模板渲染函数生成的一些html。

我去了Github page for validator.w3.org咨询API。

根据我对所读内容的解释，我尝试了以下代码：

import requests
import urllib    

index_html = '<!DOCTYPE html>\n<html lang="en">\n<head>\n  '\
    '<meta charset="UTF-8">\n  '\
    '<title></title>\n</head>\n<body>\n  \n</body>\n</html>\n'
FRAGMENT = ''
query = {}
QUERY = 3
tokens = ['https', 'validator.w3.org', 'nu/', query, FRAGMENT]
headers = {'Content-type': 'text/html; charset=utf-8'}
query = {'out': 'json'}
query = urllib.parse.urlencode(query)
tokens[QUERY] = query
url = urllib.parse.urlunsplit(tokens)
kwargs = dict(
    headers=headers,
    data=index_html,
)
response = requests.post(url, **kwargs)

response.json()返回：

*** UnicodeEncodeError：'ascii'编解码器无法编码位置48中的字符'\ u201c'：序数不在范围内（128）

response.content就是这样：

b'{“messages”：[{“type”：“info”，“message”：“Content-Type为\ xe2 \ x80 \ x9ctext / html \ xe2 \ x80 \ x9d。使用HTML解析器。 “}，{”type“：”info“，”message“：”使用带有SVG 1.1，MathML 3.0，RDFa 1.1和ITS 2.0支持的HTML架构。“}，{”type“：”error“，” lastLine“：5，”lastColumn“：17，”firstColumn“：10，”message“：”Element \ xe2 \ x80 \ x9ctitle \ xe2 \ x80 \ x9d不能为空。“，”extract“：”\ n \ ñ

type(response.content)是<class 'bytes'>。我知道json.loads需要一个字符串，所以我假设response.json抛出异常，因为内容是以字节为单位，无法解码成字符串：

import json
json.loads(response.content.decode('utf-8'))

确实，同样的例外：

*** UnicodeEncodeError：'ascii'编解码器无法编码位置48中的字符'\ u201c'：序数不在范围内（128）

我的知识已经用完了，让我一直想知道为了从requests.post response获取JSON，要更改此代码的哪一部分。

提前感谢您的帮助。

Answer 1

答案是检查确实一种是使用̶P̶y̶t̶h̶o̶n̶3̶.̶x̶而不是̶P̶y̶t̶h̶o̶n̶2̶.̶x̶当一个期望使用̶P̶y̶t̶h̶o̶n̶3̶.̶x̶！̶

请参阅下面的更新。

谢谢。

{'messages': [{'message': 'The Content-Type was “text/html”. Using the HTML parser.', 'type': 'info'}, {'message': 'Using the schema for HTML with SVG 1.1, MathML 3.0, RDFa 1.1, and ITS 2.0 support.', 'type': 'info'}, {'extract': '\n <title></title>\n</hea', 'firstColumn': 10, 'hiliteLength': 8, 'hiliteStart': 10, 'lastColumn': 17, 'lastLine': 5, 'message': 'Element “title” must not be empty.', 'type': 'error'}]}

<强>更新：

这个故事还有更多内容。事实上，我使用的是Python3。我刚刚省略了关于使用py.test和--pdb选项的部分。

我怎么知道我在使用Python3？

来自python3 test_mytest.py的输出，其中test_mytest.py位于：

if __name__ == '__main__':
    import sys
    sys.exit(pytest.main('-s --pdb'))

就是这样：

平台linux - Python 3.4.3 ，pytest-2.8.3，py-1.4.31，pluggy-0.3.1

在放入pdb后我仍然遇到编码错误。我在the answer by @daveagp in this post.

中找到了解决方案

他写了a page来解决这个问题。谢谢@daveagp。

执行export PYTHONIOENCODING='utf_8'后，我不再有任何编码错误。

我错了我的错误！

如何解析从requests.post到validator.w3.org？

1 个答案: