Scrapy:未处理的错误

时间:2014-01-16 20:26:17

标签: python scrapy

我的刮刀可以运行约一个小时。过了一会儿,我开始看到这些错误:

2014-01-16 21:26:06+0100 [-] Unhandled Error
        Traceback (most recent call last):
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/crawler.py", line 93, in start
            self.start_reactor()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/crawler.py", line 130, in start_reactor
            reactor.run(installSignalHandlers=False)  # blocking call
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 1192, in run
            self.mainLoop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 1201, in mainLoop
            self.runUntilCurrent()
        --- <exception caught here> ---
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
            call.func(*call.args, **call.kw)
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/utils/reactor.py", line 41, in __call__
            return self._func(*self._a, **self._kw)
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/engine.py", line 106, in _next_request
            if not self._next_request_from_scheduler(spider):
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/engine.py", line 132, in _next_request_from_scheduler
            request = slot.scheduler.next_request()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/scheduler.py", line 64, in next_request
            request = self._dqpop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/core/scheduler.py", line 94, in _dqpop
            d = self.dqs.pop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/queuelib/pqueue.py", line 43, in pop
            m = q.pop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/Scrapy-0.20.2-py2.7.egg/scrapy/squeue.py", line 18, in pop
            s = super(SerializableQueue, self).pop()
          File "/home/scraper/.fakeroot/lib/python2.7/site-packages/queuelib/queue.py", line 157, in pop
            self.f.seek(-size-self.SIZE_SIZE, os.SEEK_END)
        exceptions.IOError: [Errno 22] Invalid argument

可能导致这种情况的原因是什么?我的版本是0.20.2。一旦我收到此错误,scrapy就会停止做任何事情。即使我再次停止并运行它(使用JOBDIR目录),它仍然会给我这些错误。我需要删除作业目录并重新开始,如果我需要摆脱这些错误。

1 个答案:

答案 0 :(得分:3)

试试这个:

  • 确保您运行的是最新的Scrapy版本(当前:0.24)
  • 在已恢复的文件夹中搜索,并备份文件 requests.seen
  • 备份后,删除scrapy job文件夹
  • 再次使用JOBDIR =选项重新开始抓取
  • 停止抓取
  • 将新创建的 requests.seen 替换为先前备份的
  • 再次开始抓取