Question

尝试查询Google大查询数据集。

当我运行以下代码时：

from google.cloud import bigquery
bqClient = bigquery.Client.from_service_account_json("/..../....json")

query ="""
SELECT
CONCAT(
'https://stackoverflow.com/questions/',
CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
LIMIT 10
"""
#dry_run so no resources are wasted
job_config = bigquery.QueryJobConfig(dry_run=True)
query_job = bqClient.query(query, job_config = job_config)
iterator = query_job.result(timeout=30)

错误：

File "test.py", line 40, in <module>
iterator = query_job.result(timeout=30)
File ".../python3.6/site-packages/google/cloud/bigquery/job.py", line 3129, in result
self.job_id, retry, project=self.project, location=self.location
File ".../python3.6/site-packages/google/cloud/bigquery/client.py", line 1112, in _get_query_results
retry, method="GET", path=path, query_params=extra_params
File ".../python3.6/site-packages/google/cloud/bigquery/client.py", line 487, in _call_api
return call()
File ".../python3.6/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File ".../python3.6/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File ".../python3.6/site-packages/google/cloud/_http.py", line 421, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/projectone/queries/None?maxResults=0&location=US: Not found: Job projectone:US.None

(job ID: None)

           -----Query Job SQL Follows-----                

|    .    |    .    |    .    |    .    |    .    |
1:
2:SELECT
3:  CONCAT(
4:    'https://stackoverflow.com/questions/',
5:    CAST(id as STRING)) as url,
6:  view_count
7:FROM `bigquery-public-data.stackoverflow.posts_questions`
8:WHERE tags like '%google-bigquery%'
9:ORDER BY view_count DESC
10:LIMIT 10

虽然是我跑步时特有的东西

client = bigquery.Client.from_service_account_json("/.../...json")
stf_dataset_ref = client.dataset('stackoverflow', project='bigquery-public-data')
stf_dset = client.get_dataset(stf_dataset_ref)
print([x.table_id for x in client.list_tables(stf_dset)])

它返回：

[“徽章”，“评论”，“ post_history”，“ post_links”，“ posts_answers”，“ posts_moderator_nomination”，“ posts_orphaned_tag_wiki”，“ posts_privilege_wiki”，“ posts_questions”，“ posts_tag_wiki”，“ posts_tag_wiki_express” '，'stackoverflow_posts'，'tags'，'users'，'votes']

这是正确的。

为什么后者会找到数据集而不是前者？

Answer 1

运行一次dryRun时，响应中没有jobIb（如@Guillem在其评论中提到的），因此错误消息中的无

google.api_core.exceptions.NotFound：404 GET https://bigquery.googleapis.com/bigquery/v2/projects/projectone/queries/None?maxResults=0&location=US：未找到：Job projectone：US.None

您可以尝试bigQuert API游乐场在干净的环境中检查响应。

这是响应的示例：

{
  "kind": "bigquery#queryResponse",
  "jobReference": {
    "projectId": "mydata-1470162410749",
    "location": "US"
  },
  "totalBytesProcessed": "30201595278",
  "jobComplete": true,
  "cacheHit": false
}

我使用了SQL的简化版本

SELECT * FROM `bigquery-public-data.stackoverflow.posts_questions

Google Big Query以一种方式而非另一种方式访问数据集

1 个答案:

Google Big Query以一种方式而非另一种方式访问​​数据集

1 个答案:

Google Big Query以一种方式而非另一种方式访问数据集