尝试查询Google大查询数据集。
当我运行以下代码时:
from google.cloud import bigquery
bqClient = bigquery.Client.from_service_account_json("/..../....json")
query ="""
SELECT
CONCAT(
'https://stackoverflow.com/questions/',
CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
LIMIT 10
"""
#dry_run so no resources are wasted
job_config = bigquery.QueryJobConfig(dry_run=True)
query_job = bqClient.query(query, job_config = job_config)
iterator = query_job.result(timeout=30)
错误:
File "test.py", line 40, in <module>
iterator = query_job.result(timeout=30)
File ".../python3.6/site-packages/google/cloud/bigquery/job.py", line 3129, in result
self.job_id, retry, project=self.project, location=self.location
File ".../python3.6/site-packages/google/cloud/bigquery/client.py", line 1112, in _get_query_results
retry, method="GET", path=path, query_params=extra_params
File ".../python3.6/site-packages/google/cloud/bigquery/client.py", line 487, in _call_api
return call()
File ".../python3.6/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File ".../python3.6/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File ".../python3.6/site-packages/google/cloud/_http.py", line 421, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/projectone/queries/None?maxResults=0&location=US: Not found: Job projectone:US.None
(job ID: None)
-----Query Job SQL Follows-----
| . | . | . | . | . |
1:
2:SELECT
3: CONCAT(
4: 'https://stackoverflow.com/questions/',
5: CAST(id as STRING)) as url,
6: view_count
7:FROM `bigquery-public-data.stackoverflow.posts_questions`
8:WHERE tags like '%google-bigquery%'
9:ORDER BY view_count DESC
10:LIMIT 10
虽然是我跑步时特有的东西
client = bigquery.Client.from_service_account_json("/.../...json")
stf_dataset_ref = client.dataset('stackoverflow', project='bigquery-public-data')
stf_dset = client.get_dataset(stf_dataset_ref)
print([x.table_id for x in client.list_tables(stf_dset)])
它返回:
[“徽章”,“评论”,“ post_history”,“ post_links”,“ posts_answers”,“ posts_moderator_nomination”,“ posts_orphaned_tag_wiki”,“ posts_privilege_wiki”,“ posts_questions”,“ posts_tag_wiki”,“ posts_tag_wiki_express” ','stackoverflow_posts','tags','users','votes']
这是正确的。
为什么后者会找到数据集而不是前者?
答案 0 :(得分:3)
运行一次dryRun时,响应中没有jobIb(如@Guillem在其评论中提到的),因此错误消息中的无
google.api_core.exceptions.NotFound:404 GET https://bigquery.googleapis.com/bigquery/v2/projects/projectone/queries/None?maxResults=0&location=US:未找到:Job projectone:US.None
您可以尝试bigQuert API游乐场在干净的环境中检查响应。
这是响应的示例:
{
"kind": "bigquery#queryResponse",
"jobReference": {
"projectId": "mydata-1470162410749",
"location": "US"
},
"totalBytesProcessed": "30201595278",
"jobComplete": true,
"cacheHit": false
}
我使用了SQL的简化版本
SELECT * FROM `bigquery-public-data.stackoverflow.posts_questions