Question

我有一个Google App Engine程序，可以调用BigQuery来获取数据。

查询通常需要3到4.5秒，并且很好，但有时需要超过5秒并抛出此错误：

DeadlineExceededError：API调用urlfetch.Fetch（）花了太长时间才响应并被取消。

此article显示截止日期和不同类型的截止日期错误。

有没有办法将BigQuery作业的截止日期设置为超过5秒？无法在BigQuery API文档中找到它。

Answer 1

BigQuery查询速度很快，但通常需要比默认App Engine urlfetch超时更长的时间。 BigQuery API是异步的，因此您需要将每个短于5秒的API调用分解为步骤。

对于这种情况，我会使用App Engine Task Queue：

调用BigQuery API来插入作业。这将返回一个JobID。
在App Engine任务队列上放置一个任务，以检查该ID的BigQuery查询作业的状态。
如果BigQuery作业状态不是“完成”，请在队列中放置一个新任务以再次检查它。
如果状态为“完成”，则使用urlfetch拨打电话以检索结果。

Answer 2

注意我会选择迈克尔的建议，因为那是最强大的。我只是想指出你可以将urlfetch超时增加到60秒，这应该足以让大多数查询完成。

How to set timeout for urlfetch in Google App Engine?

Answer 3

我无法将urlfetch.set_default_fetch_deadline()方法应用于Big Query API，但是在授权大查询会话时能够增加超时，如下所示：

from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials

credentials = ServiceAccountCredentials.from_json_keyfile_dict(credentials_dict, scopes)

# Create an authorized session and set the url fetch timeout.
http_auth = credentials.authorize(Http(timeout=60))

# Build the service.
service =  build(service_name, version, http=http_auth)

# Make the query
request = service.jobs().query(body=query_body).execute()

或使用jobs().insert

的异步方法

query_response = service.jobs().insert(body=query_body).execute()

big_query_job_id = query_response['jobReference']['jobId']

# poll the job.get endpoint until the job is complete 
while True:

    job_status_response = service.jobs()\
        .get(jobId=big_query_job_id).execute()

    if job_status_response['status']['state'] == done:
        break

    time.sleep(1)   

results_respone = service.jobs()\
    .getQueryResults(**query_params)\
    .execute()

我们最终采用了类似于迈克尔建议的方法，但即使使用异步调用，getQueryResults方法（使用小maxResults参数分页）也会在网址提取时超时，抛出问题中发布的错误。

因此，为了增加Big Query / App Engine中URL提取的超时，请在授权会话时相应地设置超时。

Answer 4

要在AppEngine中发出HTTP请求，您可以使用urllib，urllib2，httplib或urlfetch。但是，无论您选择哪个库，AppEngine都会使用App Engine's URL Fetch service执行HTTP请求。

googleapiclient uses httplib2。看起来httplib2.Http将其超时传递给了urlfetch。由于它的默认值为None，因此无论您使用urlfetch.set_default_fetch_deadline设置什么，urlfetch都会将该请求的截止日期设置为5秒。

封面httplib2 uses用于HTTP请求的socket库。

要设置超时，您可以执行以下操作：

import socket
socket.setdefaulttimeout(30)

你也应该能够这样做，但我还没有测试过它：

http = httplib2.Http(timeout=30)

如果您没有现有代码来为请求计时，您可以像这样包装您的查询：

import time
start_query = time.time()

<your query code>

end_query = time.time()
print(end_query - start_query)

Answer 5

这是在AppEngine for Go中解决bigquery超时的一种方法。只需将查询的TimeoutMs设置为远低于5000. bigquery查询的默认超时为10000毫秒，超过了AppEngine中传出请求的默认5秒截止时间。

问题是必须在初始请求中设置超时：bigquery.service.Jobs.Query(…)以及用于轮询查询结果的后续b.service.Jobs.GetQueryResults(…)。

示例：

query := &gbigquery.QueryRequest{
    DefaultDataset: &gbigquery.DatasetReference{
        DatasetId: "mydatasetid",
        ProjectId: "myprojectid",
    },
    Kind:       "json",
    Query:      "<insert query here>",
    TimeoutMs:  3000, // <- important!
}

queryResponse := bigquery.service.Jobs.Query("myprojectid", query).Do()

// determine if queryResponse is a completed job and if not start to poll

queryResponseResults := bigquery.service.Jobs.
        GetQueryResults("myprojectid", res.JobRef.JobId).
        TimeoutMs(DefaultTimeoutMS) // <- important!

// determine if queryResponseResults is a completed job and if not continue to poll

关于这一点的好处是你维护整个请求的默认请求截止日期（正常请求为60秒，任务和cronjobs为10分钟），同时避免将传出请求的截止时间设置为某个任意大值。

如何在Google App Engine上设置BigQuery的截止日期

5 个答案: