使用cron

时间:2016-03-10 08:27:54

标签: python python-2.7 cron scrapy

我用cron运行scrapy蜘蛛,但它抛出了一个ImportError异常:

Traceback (most recent call last):
  File "/Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py", line 2, in <module>
    import scrapy
  File "/Library/Python/2.7/site-packages/scrapy/__init__.py", line 48, in <module>
    from scrapy.spiders import Spider
  File "/Library/Python/2.7/site-packages/scrapy/spiders/__init__.py", line 10, in <module>
    from scrapy.http import Request
  File "/Library/Python/2.7/site-packages/scrapy/http/__init__.py", line 12, in <module>
    from scrapy.http.request.rpc import XmlRpcRequest
  File "/Library/Python/2.7/site-packages/scrapy/http/request/rpc.py", line 7, in <module>
    from six.moves import xmlrpc_client as xmlrpclib
ImportError: cannot import name xmlrpc_client

奇怪的是,当我运行由cron运行的脚本时,它运行正常。

cron设置为

*   *   *   *   *   sh /Users/som/sh/hm_scraping.sh

,脚本是

#!/bin/bash
python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

我正在使用CrawlerProcess类,如下所述:http://doc.scrapy.org/en/latest/topics/practices.html

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(HmSpider)
process.start()

=============================================== =
编辑

根据MuhammadTahir和lapinkoira的评论我直接在终端测试了以下内容:

/usr/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

sudo -u som /usr/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

第一个运行正常,但是当我使用sudo(我没有设置用户的情况下运行)时,它返回相同的问题。也许cron在后台使用sudo。

任何想法??

谢谢!

1 个答案:

答案 0 :(得分:1)

我会尝试其中之一:

1-首先激活env:

source /path/of/your/venv/bin/activate && /path/of/your/venv/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

2-或不激活env(可能不起作用):

/path/of/your/venv/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py