Question

我是python的新手，我正在尝试获取中文网站的内容，我可以得到回复r但不幸的是，它有编码/解码问题。即中文字符显示不正确。

import requests
r =requests.get('http://www.example.com')
print (r.encoding)
print (r.content)

上面的代码会将编码打印为“ISO-8859-1”

回复内容包含

信息

<?xml version="1.0" encoding="gb2312"?>

我也看到过类似的东西

<head>\n<meta http-equiv="Content-Type" content="text/html; charset=gb2312"/>

中文字符显示为\xbe\xc9\xbd\xf0\xc9\xbd，应该是三个汉字。有人可以建议我该怎样做才能正确显示字符？

我试图在Python 3.4.2中这样做。

Answer 1

抱歉，我不知道你的确切编码类型，但通常它足以使用：

r.content.decode('gh2312')

或

r.content.decode('ISO-8859-1')

据我所知。请试一试。