scrapyd deploy作业失败,无法打开数据库文件

时间:2017-04-14 04:52:18

标签: python scrapyd

我正在从shell脚本触发的scrapyd上运行一批500个抓取作业。我在mac和ec2实例上本地有这个问题。         这些抓取工作已经正常工作,批量为100但是当我运行500时,它会抛出" sqlite3.OperationalError:无法打开数据库文件"约300后的例外情况。

注意:每次抓取(一个蜘蛛)都是一个项目,并部署在scrapyd上,这意味着它将部署500个项目。

完成大约300次抓取后,我开始看到此异常,无法再部署任何项目。如果我重新启动scrapyd服务器,它将不会再次重启,抛出相同的异常。

只有这样我才能重新开始并再次爬行是

  1. 停止服务器
  2. rm -rf dbs文件
  3. rm -rf egg(可能不需要)
  4. rm -rf logs(可能不需要)
  5. 启动服务器

    为什么会发生这种情况的任何想法? 这是例外

    2017-04-13T23:28:57+0000 [stdout#info] 1
    2017-04-13T23:28:57+0000 [stdout#info] Traceback (most recent call last):
    2017-04-13T23:28:57+0000 [stdout#info]   File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
    2017-04-13T23:28:57+0000 [stdout#info]   File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 39, in <module>
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 34, in main
    2017-04-13T23:28:57+0000 [stdout#info]   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 13, in project_environment
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/__init__.py", line 14, in get_application
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/app.py", line 37, in application
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/website.py", line 35, in __init__
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/website.py", line 38, in update_projects
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/poller.py", line 30, in update_projects
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 61, in get_spider_queues
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/spiderqueue.py", line 12, in __init__
    2017-04-13T23:28:57+0000 [stdout#info]   File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/sqlite.py", line 98, in __init__
    2017-04-13T23:28:57+0000 [stdout#info] sqlite3.OperationalError: unable to open database file
    2017-04-13T23:28:57+0000 [_GenericHTTPChannelProtocol,673,10.0.3.119] Unhandled Error
            Traceback (most recent call last):
              File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/http.py", line 1845, in allContentReceived
                req.requestReceived(command, path, version)
              File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/http.py", line 766, in requestReceived
                self.process()
              File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/server.py", line 190, in process
                self.render(resrc)
              File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/server.py", line 241, in render
                body = resrc.render(self)
            --- <exception caught here> ---
              File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/webservice.py", line 17, in render
                return JsonResource.render(self, txrequest)
              File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 19, in render
                r = resource.Resource.render(self, txrequest)
              File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/resource.py", line 250, in render
                return m(request)
              File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/webservice.py", line 68, in render_POST
                spiders = get_spider_list(project)
              File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 116, in get_spider_list
                raise RuntimeError(msg.splitlines()[-1])
            exceptions.RuntimeError: sqlite3.OperationalError: unable to open database file
    

    我猜测300个项目耗尽空间后的scrapyd,这就是为什么popen失败但看起来盒子有一些空间。 任何指针都会有所帮助。

    我在ec2实例上使用默认配置和python 2.7运行报废1.3.3。

    在dbs文件夹上执行lsof会显示每个.db文件的两个条目。 这是预期的吗?

    
    scrapyd    6363 ec2-user 1005u      REG              202,1      2048   148444 /home/ec2-user/scrapyENV/bin/dbs/LatamPtBlogGenesysCom.db
    scrapyd    6363 ec2-user 1006u      REG              202,1      2048   148444 /home/ec2-user/scrapyENV/bin/dbs/LatamPtBlogGenesysCom.db
    scrapyd    6363 ec2-user 1007u      REG              202,1      2048   148503 /home/ec2-user/scrapyENV/bin/dbs/WwwPeeblesshirenewsCom.db
    scrapyd    6363 ec2-user 1009u      REG              202,1      2048   148503 /home/ec2-user/scrapyENV/bin/dbs/WwwPeeblesshirenewsCom.db
    
    

0 个答案:

没有答案