我正在从shell脚本触发的scrapyd上运行一批500个抓取作业。我在mac和ec2实例上本地有这个问题。 这些抓取工作已经正常工作,批量为100但是当我运行500时,它会抛出" sqlite3.OperationalError:无法打开数据库文件"约300后的例外情况。
注意:每次抓取(一个蜘蛛)都是一个项目,并部署在scrapyd上,这意味着它将部署500个项目。
完成大约300次抓取后,我开始看到此异常,无法再部署任何项目。如果我重新启动scrapyd服务器,它将不会再次重启,抛出相同的异常。
只有这样我才能重新开始并再次爬行是
启动服务器
为什么会发生这种情况的任何想法? 这是例外
2017-04-13T23:28:57+0000 [stdout#info] 1
2017-04-13T23:28:57+0000 [stdout#info] Traceback (most recent call last):
2017-04-13T23:28:57+0000 [stdout#info] File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
2017-04-13T23:28:57+0000 [stdout#info] File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 39, in <module>
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 34, in main
2017-04-13T23:28:57+0000 [stdout#info] File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 13, in project_environment
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/__init__.py", line 14, in get_application
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/app.py", line 37, in application
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/website.py", line 35, in __init__
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/website.py", line 38, in update_projects
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/poller.py", line 30, in update_projects
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 61, in get_spider_queues
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/spiderqueue.py", line 12, in __init__
2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/sqlite.py", line 98, in __init__
2017-04-13T23:28:57+0000 [stdout#info] sqlite3.OperationalError: unable to open database file
2017-04-13T23:28:57+0000 [_GenericHTTPChannelProtocol,673,10.0.3.119] Unhandled Error
Traceback (most recent call last):
File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/http.py", line 1845, in allContentReceived
req.requestReceived(command, path, version)
File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/http.py", line 766, in requestReceived
self.process()
File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/server.py", line 190, in process
self.render(resrc)
File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/server.py", line 241, in render
body = resrc.render(self)
--- <exception caught here> ---
File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/webservice.py", line 17, in render
return JsonResource.render(self, txrequest)
File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 19, in render
r = resource.Resource.render(self, txrequest)
File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/resource.py", line 250, in render
return m(request)
File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/webservice.py", line 68, in render_POST
spiders = get_spider_list(project)
File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 116, in get_spider_list
raise RuntimeError(msg.splitlines()[-1])
exceptions.RuntimeError: sqlite3.OperationalError: unable to open database file
我猜测300个项目耗尽空间后的scrapyd,这就是为什么popen失败但看起来盒子有一些空间。 任何指针都会有所帮助。
我在ec2实例上使用默认配置和python 2.7运行报废1.3.3。
在dbs文件夹上执行lsof会显示每个.db文件的两个条目。 这是预期的吗?
scrapyd 6363 ec2-user 1005u REG 202,1 2048 148444 /home/ec2-user/scrapyENV/bin/dbs/LatamPtBlogGenesysCom.db
scrapyd 6363 ec2-user 1006u REG 202,1 2048 148444 /home/ec2-user/scrapyENV/bin/dbs/LatamPtBlogGenesysCom.db
scrapyd 6363 ec2-user 1007u REG 202,1 2048 148503 /home/ec2-user/scrapyENV/bin/dbs/WwwPeeblesshirenewsCom.db
scrapyd 6363 ec2-user 1009u REG 202,1 2048 148503 /home/ec2-user/scrapyENV/bin/dbs/WwwPeeblesshirenewsCom.db