Google BigQuery-根据一列获取唯一行

时间:2020-01-15 16:37:19

标签: google-bigquery

拥有此BigQuery

select JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.clone_url") AS clone_url, JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.language") AS language, integer(JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.stargazers_count")) as stars from githubarchive:day.20200115 where  JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.language")="C" group by language,clone_url,stars order by stars DESC limit 1000;

它返回带有唯一“星号”的“ clone_url”条目。

我如何只显示clone_url唯一的最高星号计数?

可以优化此查询吗?

以下是查询结果:

enter image description here

谢谢

1 个答案:

答案 0 :(得分:2)

您似乎仍在使用BigQuery旧版SQL-以下是旧版SQL

#legacySQL 
SELECT 
  JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.clone_url") AS clone_url, 
  JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.language") AS language, 
  MAX(INTEGER(JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.stargazers_count"))) AS stars 
FROM [githubarchive:day.20200115]
WHERE  JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.language")="C" 
GROUP BY language, clone_url 
ORDER BY stars DESC 
LIMIT 1000  

请注意:强烈建议您迁移到BigQuery Standard SQL,因此上面的外观如下

#standardSQL
SELECT 
  JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.clone_url") AS clone_url, 
  JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.language") AS language, 
  MAX(CAST(JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.stargazers_count") AS INT64)) AS stars 
FROM `githubarchive.day.20200115`
WHERE  JSON_EXTRACT_SCALAR(payload, "$.pull_request.base.repo.language")="C" 
GROUP BY language, clone_url 
ORDER BY stars DESC 
LIMIT 1000   

以上两个查询都将返回以下内容

enter image description here