Scrapy:ascii编解码器无法编码字符

时间:2019-04-25 15:08:19

标签: python web-scraping scrapy scrapinghub

我在运行搜寻器时遇到问题

UnicodeEncodeError: 'ascii' codec can't encode characters in position

我正在使用此代码

author = str(info.css(".author::text").extract_first())

但是我仍然遇到该错误,任何想法该如何解决? 谢谢!

这是错误

Traceback (most recent call last):
 File "/usr/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 
 102, in iter_errback
yield next(it)
  File "/usr/local/lib/python2.7/site-packages/sh_scrapy/middlewares.py", line 30, in process_spider_output
for x in result:
 File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
 File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
 File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
 File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
  return (r for r in result or () if _filter(r))
 File "/app/__main__.egg/teslamotorsclub_spider/spiders/teslamotorsclub.py", line 40, in parse
author = str(info.css(".author::text").extract_first())
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

1 个答案:

答案 0 :(得分:1)

尝试:

author = info.css(".author::text").extract_first().decode('utf-8')

这样做的原因是extract_first返回一个原始字节对象。要将其转换为字符串,python不会猜测其编码方式,因此,您需要使其明确。 Utf-8几乎可以处理您扔给它的任何东西。