在Heroku上运行Celery和PhantomJS

时间:2019-07-02 19:29:02

标签: python selenium heroku phantomjs celery

我正在使用Flask,Celery和PhantomJS(Selenium)在Heroku上设置一个基于网络的刮板。刮板在我的本地计算机上运行(警告PhantomJS已弃用),但是当我开始刮板时,Celery在Heroku上冻结。

我通过buildpack https://github.com/stomita/heroku-buildpack-phantomjs安装了Web驱动程序。我还尝试过将Chrome驱动程序用于Selenium,但同样的事情-Celery冻结(通过日志查看,该过程没有继续进行)。

我的服务器上的某些代码。py:

app = Flask(__name__)
app.config['CELERY_BROKER_URL'] = os.environ['REDIS_URL']
app.config['CELERY_RESULT_BACKEND'] = os.environ['REDIS_URL']

celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)

@celery.task
def task_scrape(url):
    return do_scrape(url, standalone=False)

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/scrape', methods=['POST'])
def scrape():
    url = request.get_json()['url']
    task = task_scrape.delay(url)    
    return jsonify({ 'taskid' : task.id })

我的app.py(执行实际抓取)上的一些代码:

def render_type2_page(url):
    driver = webdriver.PhantomJS()
    driver.get(url)
    time.sleep(3)
    r = driver.page_source
    driver.quit()
    return r

预期结果是日志显示celery worker正在将抓取的数据写入内存,这些数据很快将以csv的形式下载。像这样:

[2019-07-03 01:13:36,036: WARNING/ForkPoolWorker-4] [*] Writing item # 2354187
[2019-07-03 01:13:37,269: WARNING/ForkPoolWorker-4] [*] Writing item # 2410452
[2019-07-03 01:13:38,505: WARNING/ForkPoolWorker-4] [*] Writing item # 2307212
[2019-07-03 01:13:39,844: WARNING/ForkPoolWorker-4] [*] Writing item # 2307709
[2019-07-03 01:13:41,055: WARNING/ForkPoolWorker-4] [*] Writing item # 2330733
[2019-07-03 01:13:42,283: WARNING/ForkPoolWorker-4] [*] Writing item # 2400294
[2019-07-03 01:13:43,501: WARNING/ForkPoolWorker-4] [*] Writing item # 2277081
[2019-07-03 01:13:44,729: WARNING/ForkPoolWorker-4] [*] Writing item # 2306055
[2019-07-03 01:13:45,991: WARNING/ForkPoolWorker-4] [*] Writing item # 2329127
[2019-07-03 01:13:47,312: WARNING/ForkPoolWorker-4] [*] Writing item # 2390199
[2019-07-03 01:13:48,545: WARNING/ForkPoolWorker-4] [*] Writing item # 2400295
[2019-07-03 01:13:49,797: WARNING/ForkPoolWorker-4] [*] Writing item # 2328693

但是实际结果是硒超时,芹菜工人根本不做任何工作。

0 个答案:

没有答案