我在Flask和Celery中使用Scrapy作为后台任务。
我正常启动Celery:
celery -A scrapy_flask.celery worker -l info
效果很好......
但是,我将在scrapy中使用WebSocket将数据发送到网页,因此我的代码在以下三个位置进行了更改:
socketio = SocketIO(app)
- > socketio = SocketIO(app, message_queue=SOCKETIO_REDIS_URL)
import eventlet
eventlet.monkey_patch()
使用eventlet启用芹菜:celery -A scrapy_flask.celery -P eventlet worker -l info
然后蜘蛛得到错误:Error downloading <GET http://www.XXXXXXX.com/>: DNS lookup failed: address 'www.XXXXXXX.com' not found: timeout error.
这是我的演示代码:
# coding=utf-8
import eventlet
eventlet.monkey_patch()
from flask import Flask, render_template
from flask_socketio import SocketIO
from celery import Celery
app = Flask(__name__, template_folder='./')
# Celery configuration
app.config['CELERY_BROKER_URL'] = 'redis://127.0.0.1/0'
app.config['CELERY_RESULT_BACKEND'] = 'redis://127.0.0.1/0'
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
SOCKETIO_REDIS_URL = 'redis://127.0.0.1/0'
socketio = SocketIO(app, message_queue=SOCKETIO_REDIS_URL)
from scrapy.crawler import CrawlerProcess
from TestSpider.start_test_spider import settings
from TestSpider.TestSpider.spiders.UpdateTestSpider import UpdateTestSpider
@celery.task
def background_task():
process = CrawlerProcess(settings)
process.crawl(UpdateTestSpider)
process.start() # the script will block here until the crawling is finished
@app.route('/')
def index():
return render_template('index.html')
@app.route('/task')
def start_background_task():
background_task.delay()
return 'Started'
if __name__ == '__main__':
socketio.run(app, host='0.0.0.0', port=9000, debug=True)
这是日志记录:
[2016-11-25 09:33:39,319: ERROR/MainProcess] Error downloading <GET http://www.XXXXX.com>: DNS lookup failed: address 'www.XXXXX.com' not found: timeout error.
[2016-11-25 09:33:39,320: WARNING/MainProcess] 2016-11-25 09:33:39 [scrapy] ERROR: Error downloading <GET http://www.XXXXX.com>: DNS lookup failed: address 'www.XXXXX.com' not found: timeout error.
[2016-11-25 09:33:39,420: INFO/MainProcess] Closing spider (finished)
[2016-11-25 09:33:39,421: WARNING/MainProcess] 2016-11-25 09:33:39 [scrapy] INFO: Closing spider (finished)
[2016-11-25 09:33:39,422: INFO/MainProcess] Dumping Scrapy stats:
{'downloader/exception_count': 3,
'downloader/exception_type_count/twisted.internet.error.DNSLookupError': 3,
'downloader/request_bytes': 639,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 11, 25, 1, 33, 39, 421501),
'log_count/DEBUG': 4,
'log_count/ERROR': 1,
'log_count/INFO': 10,
'log_count/WARNING': 15,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2016, 11, 25, 1, 30, 39, 15207)}