Question

我正在使用mitmproxy来操纵返回的网页HTML代码。当我在该HTML代码上使用命令时，我得到了UnicodeDecodeError。

我试图做任何事情，在这里阅读任何帖子，但对我来说仍然无效。

我已尝试过很多事情的两个例子：

msg.response.content = unicode(msg.response.content, errors='ignore'))
msg.response.content = msg.response.content.decode('utf8').encode('ascii', errors='ignore'))

我该如何处理？

Answer 1

尝试使用mitmproxy.flow.decoded上下文管理器，如下所示：

from mitmproxy.flow import decoded

def response(context, flow):
    with decoded(flow.response):
        flow.response.content = flow.response.content.replace("Google", "Noogle")

来自消息来源：

解码请求，响应或错误的上下文管理器，然后           执行块后，使用相同的编码对其进行重新编码。

示例：
   with decoded(request):
        request.content = request.content.replace("foo", "bar")

注意：我在Ubuntu 14.04上使用了mitmproxy。

Answer 2

为确保您正确解码，您需要查看HTML页面的源代码，例如<meta charset="utf-8">或<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">。 charset值是页面正在使用的编码。

如果运行type(msg.response.content)返回类型为str，则需要运行msg.response.content = msg.resposne.content.decode(u'utf-8')，其中“utf-8”是页面所说的编码。这也可能是ISO-8859-1或windows-1251或ASCII。

使用Python编辑HTML代码时的UnicodeDecodeError

2 个答案: