使用Celery eventlet启用Scrapy获取错误“DNS查找失败”

时间:2016-11-24 08:50:39

标签: flask scrapy celery flask-socketio

我在Flask和Celery中使用Scrapy作为后台任务。 我正常启动Celery: celery -A scrapy_flask.celery worker -l info

效果很好......

但是,我将在scrapy中使用WebSocket将数据发送到网页,因此我的代码在以下三个位置进行了更改:

  • socketio = SocketIO(app) - > socketio = SocketIO(app, message_queue=SOCKETIO_REDIS_URL)

  • import eventlet eventlet.monkey_patch()

  • 使用eventlet启用芹菜:celery -A scrapy_flask.celery -P eventlet worker -l info

然后蜘蛛得到错误:Error downloading <GET http://www.XXXXXXX.com/>: DNS lookup failed: address 'www.XXXXXXX.com' not found: timeout error.

这是我的演示代码:

    # coding=utf-8
    import eventlet
    eventlet.monkey_patch()

    from flask import Flask, render_template
    from flask_socketio import SocketIO
    from celery import Celery

    app = Flask(__name__, template_folder='./')

    # Celery configuration
    app.config['CELERY_BROKER_URL'] = 'redis://127.0.0.1/0'
    app.config['CELERY_RESULT_BACKEND'] = 'redis://127.0.0.1/0'

    celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
    celery.conf.update(app.config)

    SOCKETIO_REDIS_URL = 'redis://127.0.0.1/0'
    socketio = SocketIO(app, message_queue=SOCKETIO_REDIS_URL)

    from scrapy.crawler import CrawlerProcess
    from TestSpider.start_test_spider import settings
    from TestSpider.TestSpider.spiders.UpdateTestSpider import UpdateTestSpider

    @celery.task
    def background_task():
        process = CrawlerProcess(settings)
        process.crawl(UpdateTestSpider)
        process.start() # the script will block here until the crawling is finished

    @app.route('/')
    def index():
        return render_template('index.html')

    @app.route('/task')
    def start_background_task():
        background_task.delay()
        return 'Started'

    if __name__ == '__main__':
        socketio.run(app, host='0.0.0.0', port=9000, debug=True)

这是日志记录:

    [2016-11-25 09:33:39,319: ERROR/MainProcess] Error downloading <GET http://www.XXXXX.com>: DNS lookup failed: address 'www.XXXXX.com' not found: timeout error.
    [2016-11-25 09:33:39,320: WARNING/MainProcess] 2016-11-25 09:33:39 [scrapy] ERROR: Error downloading <GET http://www.XXXXX.com>: DNS lookup failed: address 'www.XXXXX.com' not found: timeout error.
    [2016-11-25 09:33:39,420: INFO/MainProcess] Closing spider (finished)
    [2016-11-25 09:33:39,421: WARNING/MainProcess] 2016-11-25 09:33:39 [scrapy] INFO: Closing spider (finished)
    [2016-11-25 09:33:39,422: INFO/MainProcess] Dumping Scrapy stats:
    {'downloader/exception_count': 3,
     'downloader/exception_type_count/twisted.internet.error.DNSLookupError': 3,
     'downloader/request_bytes': 639,
     'downloader/request_count': 3,
     'downloader/request_method_count/GET': 3,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2016, 11, 25, 1, 33, 39, 421501),
     'log_count/DEBUG': 4,
     'log_count/ERROR': 1,
     'log_count/INFO': 10,
     'log_count/WARNING': 15,
     'scheduler/dequeued': 3,
     'scheduler/dequeued/memory': 3,
     'scheduler/enqueued': 3,
     'scheduler/enqueued/memory': 3,
     'start_time': datetime.datetime(2016, 11, 25, 1, 30, 39, 15207)}

0 个答案:

没有答案