Heroku H12错误下载文件破折号应用程序

时间:2019-12-02 14:48:48

标签: heroku timeout celery plotly-dash

我有一个Dash应用程序,该应用程序一次可以收集1个月的天气数据(第三方允许的数据),然后将数据汇总在一起,以便用户下载。当我在Heroku Local上测试该应用程序时,一切正常,但是当我在Heroku上部署该应用程序时,下载过程超过30秒后,我收到了H12错误。我正在使用Celery和Redis进行后台任务和工作。我的理解是,有了后台工作人员,我可以超过30秒超时。

几件事:

  • 从第三方分块下载数据的过程要花费30多秒钟的时间,这一过程无法缩短,但可以分块进行。
  • 当前,我将数据保存到tmp文件夹中,并且用户从磁盘下载。我知道Heroku是临时的,但我不确定S3或其他存储系统是否有意义。我不需要保留文件,只需将其保留足够长的时间即可将其从磁盘发送给用户。
    • Celery任务是在Dash回调中触发的,因此在任务运行时与Web Worker之间存在一些连接。

下面是task.py代码以及app.py中调用此代码的部分。我对Heroku和Celery来说还很陌生,所以任何提示都将不胜感激。

tasks.py

import celery
import pandas as pd
import os

celery_app = celery.Celery('download')
celery_app.conf.update(BROKER_URL=os.environ['REDIS_URL'], CELERY_RESULT_BACKEND=os.environ['REDIS_URL'])

@celery_app.task
def download_remote_data(station_id, start_year, start_month, end_year, end_month, url_raw, relative_filename):

    # In this test case the download dates are defined in app.py and not dynamically by the user
    download_dates = pd.date_range(start=start_year + '/' + start_month,
                                   end=end_year + '/' + end_month, freq='M')

    # bulk data url paths
    urls = [url_raw.format(station_id, date.year, date.month, 1) for date in download_dates]

    # pandas magic
    results = pd.concat((pd.read_csv(url) for url in urls))

    # Store file to temporary folder as csv
    absolute_filename = os.path.join(os.getcwd(), relative_filename)
    results.to_csv(absolute_filename, index=False)

    return results.to_dict()

app.py

# Environment Canada Bulk Data Download Path
bulk_data_pathname = 'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?' \
                     'format=csv&stationID={}&Year={}&Month={}&Day=1&timeframe={}'
# Callback to download data
@app.callback(
    Output(component_id='download-link', component_property='href'),
    [Input(component_id='generate-btn', component_property='n_clicks')]
)
def update_output_div(clicked):
    ctx = dash.callback_context  # Look for specific click event

    if clicked and ctx.triggered[0]['prop_id'] == 'generate-btn.n_clicks':
        # store downloaded csv file in tmp
        relative_filename = os.path.join('tmp', 'downloaded.csv')
        # Use fixed dates for testing, reality is user will set dates
        data = tasks.download_remote_data.apply_async(['348', '2000', '1', '2010', '1', bulk_data_pathname,
                                                       relative_filename])
        link_path = '/{}'.format(relative_filename)
    else:
        link_path = ''
    return link_path

# Flask Magik
@app.server.route('/tmp/<path:path>')
def serve_static(path):
    root_dir = os.getcwd()
    return flask.send_from_directory(
        os.path.join(root_dir, 'tmp'), path
    )

if __name__ == '__main__':
    app.run_server(debug=True, processes=4)

Procfile

web: gunicorn app:server --timeout 90 -w 4 -k gevent --log-file=-
worker: celery -A tasks worker --loglevel=info

0 个答案:

没有答案