Scrapy - TypeError:无法转换unicode主体 - HtmlResponse没有编码

时间:2016-08-30 21:20:23

标签: python python-2.7 encoding scrapy scrapy-spider

当我尝试在Scrapy中构造一个HtmlResponse对象时:

scrapy.http.HtmlResponse(url=self.base_url + dealer_url[0], body=dealer_html)

我收到了这个错误:

Traceback (most recent call last):

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks

    current.result = callback(current.result, *args, **kw)

  File "D:\Kerja\HIT\Python Projects\<project_name>\<project_name>\<project_name>\<project_name>\spiders\fwi.py", line 69, in parse_items

    dealer_page = scrapy.http.HtmlResponse(url=self.base_url + dealer_url[0], body=dealer_html)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\text.py", line 27, in __init__

    super(TextResponse, self).__init__(*args, **kwargs)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\__init__.py", line 18, in __init__

    self._set_body(body)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\text.py", line 43, in _set_body

    type(self).__name__)

TypeError: Cannot convert unicode body - HtmlResponse has no encoding

有谁知道如何解决此错误?

1 个答案:

答案 0 :(得分:7)

HtmlResponse正在尝试检测编码:

  

HtmlResponse类是TextResponse的一个子类,它添加了   通过查看HTML元数据来编码自动发现支持   http-equiv属性。见TextResponse.encoding。

所以基本上你提供给body参数(在你的情况下为dealer_html)的html字符串没有指定编码。 根据{{​​3}}它应该有:

HTML 4.01: <meta http-equiv="content-type" content="text/html; charset=UTF-8">
HTML5: <meta charset="UTF-8">

在这种情况下,您可以通过HtmlResponse参数创建encoding对象时修复html或指定编码:

HtmlResponse(url='http://scrapy.org', body=u'some body', encoding='utf-8')