Question

我目前正在以免费试用模式开展Google云端项目。我有cron作业从数据供应商处获取数据并将其存储在数据存储中。我在几周之前编写了代码来获取数据并且一切正常但突然间，我开始收到错误＆＃34; DeadlineExceededError：超出了响应HTTP请求的总体截止日期＆＃34; 最近两天。我相信cron作业应该在60分钟之后超时才知道我为什么会收到错误？。

cron任务

int

存储库代码

int *

RESTClient实现

def run():
  try:
    config = cron.config
    actual_data_source = config['xxx']['xxxx']
    original_data_source = actual_data_source

    company_list = cron.rest_client.load(config, "companies", '')

    if not company_list:
        logging.info("Company list is empty")
        return "Ok"

    for row in company_list:
        company_repository.save(row,original_data_source, actual_data_source)

    return "OK"

Answer 1

我猜这是与此相同的问题，但现在有更多代码： DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded

我修改了你的代码，在每次urlfetch之后写入数据库。如果有更多页面，那么它会在延迟任务中重新启动，这应该在10分钟超时之前完成。

延迟任务中未捕获的异常导致它重试，所以要注意这一点。

我不清楚actual_data_source＆amp; original_data_source有效，但我认为您应该可以修改该部分。

<强> crontask

def run(current_page=0):
  try:
    config = cron.config
    actual_data_source = config['xxx']['xxxx']
    original_data_source = actual_data_source

    data, more = cron.rest_client.load(config, "companies", '', current_page)

    for row in data:
          company_repository.save(row, original_data_source, actual_data_source)

    # fetch the rest
    if more:
        deferred.defer(run, current_page + 1)
  except Exception as e:
     logging.exception("run() experienced an error: %s" % e)

<强> RESTClient实现

  def load(config, resource, filter, current_page):
    try:
        username = config['xxxx']['xxxx']
        password = config['xxxx']['xxxx']
        headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":" 
        + password)}

        if filter:
            from_date = filter['from']
            to_date = filter['to']
            ticker = filter['ticker']
            start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
            end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")

            url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
        else:
            url = config['xxxx']["endpoints"][resource] % (current_page)

        response = urlfetch.fetch(
                url=url,
                deadline=60,
                method=urlfetch.GET,
                headers=headers,
                follow_redirects=False,

        )
        if response.status_code != 200:
                logging.error("xxxx GET received status code %d!" % (response.status_code))
                logging.error("error happend for url: %s with headers %s", url, headers)
                return [], False

        db = json.loads(response.content)

        return db['data'], (db['total_pages'] != current_page)


    except Exception as e:
         logging.exception("Error occured with xxxx API request: %s" % e)
         return [], False

Answer 2

我更愿意将此作为评论，但我需要更多的声誉。

直接运行实际数据提取时会发生什么，而不是通过cron工作？
您是否尝试过测量从开始到结束的时间差这份工作？
被检索的公司数量是否大幅增加？
您似乎在做某种形式的股票报价聚合 - 是吗？提供商可能已经开始阻止你吗？

cron job throw DeadlineExceededError

2 个答案: