我在Python上使用BigQuery。我试图找出如何运行简单的SELECT
查询,但我收到大量结果的错误。
在用Python编写之前,我已经在BigQuery界面中测试了我的查询。它运行正常,返回1行,需要4.0秒,处理18.2GB。基础表大约150GB,200m行。
这是我的代码:
credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
try:
query_request = bigquery_service.jobs()
query_data = {
"allowLargeResults": True,
'query': (
'SELECT org_code, item_code FROM [mytable] ',
"WHERE (time_period='201501') ",
"AND item_code='0212000AAAAAAAA' ",
"AND (org_code='B82005') "
"LIMIT 10;"
)
}
print ' '.join(query_data['query'])
response = query_request.query(
projectId=project_id,
body=query_data).execute()
job_ref = response['jobReference']
print 'job_ref', job_ref
except HttpError as err:
print('Error: {}'.format(err.content))
raise err
这是我得到的输出:
SELECT org_code, item_code FROM [mytable] WHERE (time_period='201501') AND (item_code='0212000AAAAAAAA') AND (org_code='B82005') LIMIT 10;
Error: {
"error": {
"errors": [
{
"domain": "global",
"reason": "responseTooLarge",
"message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
}
],
"code": 403,
"message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
}
}
Traceback (most recent call last):
File "query.py", line 93, in <module>
main(args.project_id)
File "query.py", line 82, in main
raise err
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/bigquery/v2/projects/824821804911/queries?alt=json returned "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors">
有几件不同的事让我感到困惑:
allowLargeResults
,即使我已经是。 SELECT
查询,它返回1行。 我知道如果查询处理的任何部分变得太大,警告将会触发。但我真的不知道如何解决这个问题,因为我所做的查询只是SELECT
而没有分组等。我甚至没有使用SELECT *
。
BigQuery的重点在于它可以处理这类事情吗?
如何解决此问题?
答案 0 :(得分:2)
如果configuration.query.allowLargeResults
设置为true,则还需要configuration.query.destinationTable
您应该添加destinationTable对象或(因为您的输出似乎很小)将allowLargeResults设置为false
添加了配置示例:
'query': {
'query': 'my_query_text',
'destinationTable': {
'projectId': 'my_project',
'datasetId': 'my_dataset',
'tableId': 'my_table'
},
'createDisposition': 'CREATE_IF_NEEDED',
'writeDisposition': 'WRITE_TRUNCATE',
'allowLargeResults': True
}
答案 1 :(得分:0)
让我们清楚一些错误的地方。
返回大量结果的查询受到其他限制:
documentation明确/gopasto/
如果为true,则允许查询以较低的性能成本生成任意大的结果表。需要设置destinationTable。
答案 2 :(得分:0)
[mytable]可能是一个视图而不是一个表吗?
答案 3 :(得分:0)
我有同样的问题。我通过使用job.insert()而不是job.query()来解决它。为 allowLargeResults 指定true。同时为查询提供 destinationTable 。
以下是示例代码:
job_data = {
"jobReference": {
"projectId": "project_id"
},
"configuration": {
"query": {
"query": "query",
"allowLargeResults": "True",
"destinationTable": {
"projectId": "project_id",
"tableId": "table_name",
"datasetId": "dataset_name"
}
}
}
}
return bigquery.jobs().insert(
projectId="project_id",
body=job_data).execute()