我正在使用Scrapy来搜索网站。 虽然刮削过程工作得很好。但是,我在命令行上收到错误日志:
2017-08-21 03:47:22 [scrapy.utils.signal] ERROR: Error caught on signal handler:
<bound method ?.item_scraped of <scrapy.extensions.feedexport.FeedExporter obje
ct at 0x00000000041DD908>>
Traceback (most recent call last):
File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 150, in m
aybeDeferred
result = f(*args, **kw)
File "c:\python27\lib\site-packages\pydispatch\robustapply.py", line 55, in ro
bustApply
return receiver(*arguments, **named)
File "c:\python27\lib\site-packages\scrapy\extensions\feedexport.py", line 224
, in item_scraped
slot.exporter.export_item(item)
File "c:\python27\lib\site-packages\scrapy\exporters.py", line 93, in export_i
tem
data = self.encoder.encode(itemdict) + '\n'
File "c:\python27\lib\json\encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "c:\python27\lib\json\encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa3 in position 5517: invali
d start byte
可能导致这种情况的原因是,我使用scrapy选择器对象来抓取数据。我想摆脱命令行上显示的这些错误。
编辑:该网站正在使用iso-8859-1
编码