Google Big Query以一种方式而非另一种方式访问​​数据集

时间:2019-12-28 05:22:40

标签: google-bigquery

尝试查询Google大查询数据集。

当我运行以下代码时:

from google.cloud import bigquery
bqClient = bigquery.Client.from_service_account_json("/..../....json")

query ="""
SELECT
CONCAT(
'https://stackoverflow.com/questions/',
CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
LIMIT 10
"""
#dry_run so no resources are wasted
job_config = bigquery.QueryJobConfig(dry_run=True)
query_job = bqClient.query(query, job_config = job_config)
iterator = query_job.result(timeout=30)

错误:

File "test.py", line 40, in <module>
iterator = query_job.result(timeout=30)
File ".../python3.6/site-packages/google/cloud/bigquery/job.py", line 3129, in result
self.job_id, retry, project=self.project, location=self.location
File ".../python3.6/site-packages/google/cloud/bigquery/client.py", line 1112, in _get_query_results
retry, method="GET", path=path, query_params=extra_params
File ".../python3.6/site-packages/google/cloud/bigquery/client.py", line 487, in _call_api
return call()
File ".../python3.6/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File ".../python3.6/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File ".../python3.6/site-packages/google/cloud/_http.py", line 421, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/projectone/queries/None?maxResults=0&location=US: Not found: Job projectone:US.None

(job ID: None)

           -----Query Job SQL Follows-----                

|    .    |    .    |    .    |    .    |    .    |
1:
2:SELECT
3:  CONCAT(
4:    'https://stackoverflow.com/questions/',
5:    CAST(id as STRING)) as url,
6:  view_count
7:FROM `bigquery-public-data.stackoverflow.posts_questions`
8:WHERE tags like '%google-bigquery%'
9:ORDER BY view_count DESC
10:LIMIT 10

虽然是我跑步时特有的东西

client = bigquery.Client.from_service_account_json("/.../...json")
stf_dataset_ref = client.dataset('stackoverflow', project='bigquery-public-data')
stf_dset = client.get_dataset(stf_dataset_ref)
print([x.table_id for x in client.list_tables(stf_dset)])

它返回:

[“徽章”,“评论”,“ post_history”,“ post_links”,“ posts_answers”,“ posts_moderator_nomination”,“ posts_orphaned_tag_wiki”,“ posts_privilege_wiki”,“ posts_questions”,“ posts_tag_wiki”,“ posts_tag_wiki_express” ','stackoverflow_posts','tags','users','votes']

这是正确的。

为什么后者会找到数据集而不是前者?

1 个答案:

答案 0 :(得分:3)

运行一次dryRun时,响应中没有jobIb(如@Guillem在其评论中提到的),因此错误消息中的

  

google.api_core.exceptions.NotFound:404 GET https://bigquery.googleapis.com/bigquery/v2/projects/projectone/queries/None?maxResults=0&location=US:未找到:Job projectone:US.None

您可以尝试bigQuert API游乐场在干净的环境中检查响应。

这是响应的示例:

{
  "kind": "bigquery#queryResponse",
  "jobReference": {
    "projectId": "mydata-1470162410749",
    "location": "US"
  },
  "totalBytesProcessed": "30201595278",
  "jobComplete": true,
  "cacheHit": false
}

我使用了SQL的简化版本

SELECT * FROM `bigquery-public-data.stackoverflow.posts_questions

相关问题