Question

我使用Scrapy编写蜘蛛，但我遇到了这个错误。这是我的代码：

# -*- coding: utf-8 -*-
import scrapy

class ZhihuSpider(scrapy.Spider):
    name = "zhihu"
    allowed_domains = ["www.zhihu.com"]

    def start_requests(self):
        return [scrapy.Request('http://www.zhihu.com/#signin')]

    def parse(self, response):
        print response

错误信息是：

Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 1183, in
_inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "C:\Python27\lib\site-packages\twisted\python\failure.py", line 389, in t
hrowExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "C:\Python27\lib\site-packages\scrapy\core\downloader\middleware.py", lin
e 37, in process_request
    response = yield method(request=request, spider=spider)
  File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 587, in _
runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "C:\Python27\lib\site-packages\scrapy\downloadermiddlewares\robotstxt.py"
, line 45, in process_request_2
    to_native_str(self._useragent), request.url):
  File "C:\Python27\lib\site-packages\scrapy\utils\python.py", line 127, in to_n
ative_str
    return to_bytes(text, encoding, errors)
  File "C:\Python27\lib\site-packages\scrapy\utils\python.py", line 117, in to_b
ytes
    'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got set

Answer 1

您的allowed_domains无效，如此

allowed_domains = ["zhihu.com"]

在scrapy.core.downloader.webclient.py中，将解析每个URL。函数to_bytes会检查它是None，否则会引发TypeError。

Answer 2

尝试使用错误的数据类型时，您可以遇到错误。例如：

str = 15
print str.encode("ascii") # Error occurs

str = "15"
print str.encode("ascii") # Right code because encode function belongs to unicode string data type but not integer.

TypeError：to_bytes必须接收一个unicode，str或bytes对象

2 个答案: