我使用Scrapy编写蜘蛛,但我遇到了这个错误。这是我的代码:
# -*- coding: utf-8 -*-
import scrapy
class ZhihuSpider(scrapy.Spider):
name = "zhihu"
allowed_domains = ["www.zhihu.com"]
def start_requests(self):
return [scrapy.Request('http://www.zhihu.com/#signin')]
def parse(self, response):
print response
错误信息是:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 1183, in
_inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "C:\Python27\lib\site-packages\twisted\python\failure.py", line 389, in t
hrowExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "C:\Python27\lib\site-packages\scrapy\core\downloader\middleware.py", lin
e 37, in process_request
response = yield method(request=request, spider=spider)
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 587, in _
runCallbacks
current.result = callback(current.result, *args, **kw)
File "C:\Python27\lib\site-packages\scrapy\downloadermiddlewares\robotstxt.py"
, line 45, in process_request_2
to_native_str(self._useragent), request.url):
File "C:\Python27\lib\site-packages\scrapy\utils\python.py", line 127, in to_n
ative_str
return to_bytes(text, encoding, errors)
File "C:\Python27\lib\site-packages\scrapy\utils\python.py", line 117, in to_b
ytes
'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got set
答案 0 :(得分:0)
您的allowed_domains
无效,如此
allowed_domains = ["zhihu.com"]
在scrapy.core.downloader.webclient.py中,将解析每个URL。函数to_bytes
会检查它是None
,否则会引发TypeError
。
答案 1 :(得分:-1)
尝试使用错误的数据类型时,您可以遇到错误。 例如:
str = 15
print str.encode("ascii") # Error occurs
str = "15"
print str.encode("ascii") # Right code because encode function belongs to unicode string data type but not integer.