我有一个Python进程,可以对数百万条记录进行数据处理。目前,我计划在Flex环境中运行它。
代码如下:
from datetime import datetime, timedelta
import logging
from flask import Flask, request
import google.cloud.logging
import pandas as pd
import numpy as np
from pandas.io import gbq
from google.cloud import bigquery
from google.cloud.bigquery import schema
from google.api_core import page_iterator
#import google.cloud.bigquery.table
logging.basicConfig(level=logging.DEBUG)
app = Flask(__name__)
from mbarticle import *
@app.route('/pythonappmba')
def hello():
#Calling mb_cal() to calculate Table massaging
logging.debug('Program Started...Calling mb_cal() to basket details')
status = mb_cal()
logging.debug('Program Ended...')
logging.debug(status)
"""Return a friendly HTTP greeting."""
return status
@app.errorhandler(500)
def server_error(e):
logging.exception('An error occurred during a request.')
return """
An internal error occurred: <pre>{}</pre>
See logs for full stacktrace.
""".format(e), 500
if __name__ == '__main__':
# This is used when running locally. Gunicorn is used to run the
# application on Google App Engine. See entrypoint in app.yaml.
app.run(host='127.0.0.1', port=8080, debug=True)
# [END gae_flex_quickstart]
mb_cal()是可运行1000万条记录的函数,该过程至少需要15分钟来进行数据按摩,并使用以下函数将数据加载回Bigquery :( df是最终数据帧)
df.to_gbq('Dataset.Tablename',
'Project',
chunksize=1000000,
if_exists='replace'
)
运行完整记录时,出现以下错误:
[error] 38#38: *38669 upstream prematurely closed connection while reading response header from upstream
但是,当我将记录限制为某些条件(在不到1分钟的时间内完成)时,代码就会起作用。但是我需要处理至少15分钟的完整数据。 App Engine Flex中的python是否有超时限制?