对于bigquery python客户端,已经配置为使用标准SQL
query_job = self.client.run_async_query(str(uuid.uuid4()), query_str)
query_job.use_query_cache = True # query cache
query_job.use_legacy_sql = False
但是,在发送查询时,在批处理作业中间获得以下400个错误 - 在执行期间超出了抱怨资源。查询相当简单 - 在每日分区表中在30分钟范围内获得及时排序的行(每天有大约4000万行,总共15-20G数据)。由于每个查询覆盖30分钟的范围,因此相同的查询将运行48次以覆盖一天。每个查询返回500k - 150万行,数据量在几百MB的范围内。以下查询最初执行得很好,但只有在10-20次迭代后,才会弹出RESOURCES exceeds
错误。
可以在帮助之前得到相同问题的大型专家,专家,开发人员可以提供一些暗示,这里可能出现问题。真的很感激!
罗伊
SELECT
user_id,
client_ip,
url,
req_ts,
req_body,
resp_body,
status
FROM
xxxx.table
WHERE
DATE(_PARTITIONTIME) = '2017-09-16'
AND req_ts >= '2017-09-16 15:30:00'
AND req_ts < '2017-09-16 16:00:00' order by req_ts
File "../datastore/bigquery.py", line 202, in sendQuery
query_job.result() #Wait for job to complete
File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/job.py", line 492, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 104, in result
self._blocking_poll(timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 84, in _blocking_poll
retry_(self._done_or_raise)()
File "/usr/local/lib/python2.7/dist-packages/google/api/core/retry.py", line 258, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python2.7/dist-packages/google/api/core/retry.py", line 175, in retry_target
return target()
File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 62, in _done_or_raise
if not self.done():
File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/job.py", line 1301, in done
self._query_results = self._client.get_query_results(self.name)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/client.py", line 196, in get_query_results
method='GET', path=path, query_params=extra_params)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
BadRequest: 400 GET https://www.googleapis.com/bigquery/v2/projects/fluted-house-161501/queries/ab8534f8-fe52-448c-84fe-b8702ee7b87c?maxResults=0: Resources exceeded during query execution: The query could not be executed in the allotted memory.
答案 0 :(得分:3)
问题出现在ORDER BY中,导致整个结果在输出结果之前被移动到一个工人进行最终排序。如果结果足够大,这通常会导致“在查询执行期间超出资源”
这里的建议是添加LIMIT一些合理的数字 - 在这种情况下 - 部分排序发生在所有工人身上,最终排序是在一个节点上进行的,但现在结果非常简单,或者只是删除ORDER BY并按顺序进行排序客户端
在Order query operations to maximize performance
查看有关ORDER BY的更多信息。请查看第二段