在scrapy中,我收到错误exceptions.AttributeError: 'unicode' object has no attribute 'dont_filter'
。在搜索之后,我发现this回答(这是有意义的,因为它是我在获取错误之前修改的唯一代码),据此我修改了我的代码。我改变了start_request
以在列表中产生值,而不是将它全部重新整理,但我仍然得到它。有什么想法吗?
def start_requests(self):
connection = pymongo.Connection(settings['MONGODB_SERVER'],
settings['MONGODB_PORT'])
db = connection[settings['MONGODB_DB']]
collection = db[settings['MONGODB_COLLECTION']]
for el in [i['url'] for i in collection.find({}, {'_id':0, 'url':1})]:
yield el
我已经检查了代码的其他部分,以确认其他一切都没问题。
回溯:
[-] Unhandled Error
Traceback (most recent call last):
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 93, in start
self.start_reactor()
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 130, in start_reactor
reactor.run(installSignalHandlers=False) # blocking call
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1192, in run
self.mainLoop()
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1201, in mainLoop
self.runUntilCurrent()
--- <exception caught here> ---
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/utils/reactor.py", line 41, in __call__
return self._func(*self._a, **self._kw)
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 120, in _next_request
self.crawl(request, spider)
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 176, in crawl
self.schedule(request, spider)
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 182, in schedule
return self.slot.scheduler.enqueue_request(request)
File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/core/scheduler.py", line 48, in enqueue_request
if not request.dont_filter and self.df.request_seen(request):
exceptions.AttributeError: 'unicode' object has no attribute 'dont_filter'
答案 0 :(得分:3)
start_requests
应该生成单独的Request对象,而不仅仅是单个URL。但是代码中的每个el
显然都是一个URL。尝试更改
yield el
到
yield self.make_requests_from_url(el)
(请参阅您链接的问题以获取此示例)