我用cron运行scrapy蜘蛛,但它抛出了一个ImportError异常:
Traceback (most recent call last):
File "/Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py", line 2, in <module>
import scrapy
File "/Library/Python/2.7/site-packages/scrapy/__init__.py", line 48, in <module>
from scrapy.spiders import Spider
File "/Library/Python/2.7/site-packages/scrapy/spiders/__init__.py", line 10, in <module>
from scrapy.http import Request
File "/Library/Python/2.7/site-packages/scrapy/http/__init__.py", line 12, in <module>
from scrapy.http.request.rpc import XmlRpcRequest
File "/Library/Python/2.7/site-packages/scrapy/http/request/rpc.py", line 7, in <module>
from six.moves import xmlrpc_client as xmlrpclib
ImportError: cannot import name xmlrpc_client
奇怪的是,当我运行由cron运行的脚本时,它运行正常。
cron设置为
* * * * * sh /Users/som/sh/hm_scraping.sh
,脚本是
#!/bin/bash
python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py
我正在使用CrawlerProcess类,如下所述:http://doc.scrapy.org/en/latest/topics/practices.html
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(HmSpider)
process.start()
=============================================== =
编辑
根据MuhammadTahir和lapinkoira的评论我直接在终端测试了以下内容:
/usr/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py
和
sudo -u som /usr/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py
第一个运行正常,但是当我使用sudo(我没有设置用户的情况下运行)时,它返回相同的问题。也许cron在后台使用sudo。
任何想法??
谢谢!
答案 0 :(得分:1)
我会尝试其中之一:
1-首先激活env:
source /path/of/your/venv/bin/activate && /path/of/your/venv/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py
2-或不激活env(可能不起作用):
/path/of/your/venv/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py