Question

我在Python上使用BigQuery。我试图找出如何运行简单的SELECT查询，但我收到大量结果的错误。

在用Python编写之前，我已经在BigQuery界面中测试了我的查询。它运行正常，返回1行，需要4.0秒，处理18.2GB。基础表大约150GB，200m行。

这是我的代码：

credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
try:
    query_request = bigquery_service.jobs()
    query_data = {
        "allowLargeResults": True,
        'query': (
            'SELECT org_code, item_code FROM [mytable] ',
            "WHERE (time_period='201501') ",
            "AND item_code='0212000AAAAAAAA' ",
            "AND (org_code='B82005') "
            "LIMIT 10;"
        )
    }
    print ' '.join(query_data['query'])
    response = query_request.query(
        projectId=project_id,
        body=query_data).execute()
    job_ref = response['jobReference']
    print 'job_ref', job_ref

except HttpError as err:
    print('Error: {}'.format(err.content))
    raise err

这是我得到的输出：

SELECT org_code, item_code FROM [mytable]  WHERE (time_period='201501')  AND (item_code='0212000AAAAAAAA')  AND (org_code='B82005') LIMIT 10;
Error: {
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "responseTooLarge",
    "message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
   }
  ],
  "code": 403,
  "message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
 }
}

Traceback (most recent call last):
  File "query.py", line 93, in <module>
    main(args.project_id)
  File "query.py", line 82, in main
    raise err
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/bigquery/v2/projects/824821804911/queries?alt=json returned "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors">

有几件不同的事让我感到困惑：

它说我应该使用allowLargeResults，即使我已经是。
它给了我关于大结果的警告，虽然这是一个没有分组的简单SELECT查询，它返回1行。

我知道如果查询处理的任何部分变得太大，警告将会触发。但我真的不知道如何解决这个问题，因为我所做的查询只是SELECT而没有分组等。我甚至没有使用SELECT * 。

BigQuery的重点在于它可以处理这类事情吗？

如何解决此问题？

Answer 1

如果configuration.query.allowLargeResults设置为true，则还需要configuration.query.destinationTable

您应该添加destinationTable对象或（因为您的输出似乎很小）将allowLargeResults设置为false

添加了配置示例：

'query': {
    'query': 'my_query_text',
    'destinationTable': {
        'projectId': 'my_project',
        'datasetId': 'my_dataset',
        'tableId': 'my_table'
    },
    'createDisposition': 'CREATE_IF_NEEDED',
    'writeDisposition': 'WRITE_TRUNCATE',
    'allowLargeResults': True
}

Answer 2

让我们清楚一些错误的地方。

返回大量结果的查询受到其他限制：

您必须指定目的地表。
您无法指定顶级ORDER BY，TOP或LIMIT子句。这样做会否定使用allowLargeResults的好处，因为无法再并行计算查询输出。
仅当与PARTITION BY子句一起使用时，窗口函数才能返回大型查询结果。

documentation明确/gopasto/如果为true，则允许查询以较低的性能成本生成任意大的结果表。需要设置destinationTable。

Answer 3

[mytable]可能是一个视图而不是一个表吗？

Answer 4

我有同样的问题。我通过使用job.insert（）而不是job.query（）来解决它。为 allowLargeResults 指定true。同时为查询提供 destinationTable 。

以下是示例代码：

job_data = {
"jobReference": {
  "projectId": "project_id"
},
"configuration": {
  "query": {
     "query": "query",
     "allowLargeResults": "True",
     "destinationTable": {
        "projectId": "project_id",
        "tableId": "table_name",
        "datasetId": "dataset_name"
     }
  }
}
}

return bigquery.jobs().insert(
    projectId="project_id",
    body=job_data).execute()

＆＃34;响应太大而无法返回＆＃34;在BigQuery中使用简单的SELECT，即使使用allowLargeResults = True？

4 个答案: