如何在GitHub-Archive中获取具有最大星数的Java存储库

时间:2015-06-08 00:52:00

标签: google-bigquery github-archive

我目前正在尝试使用GitHub Archive和BigQuery获取拥有最多星数和少于100个提交的前100个Java存储库。你能否帮忙提出一个查询,以获得拥有最多星数的前100个存储库。

我获得的最终查询是:

SELECT repository_name
FROM [githubarchive:github.timeline]
WHERE repository_language = 'Java' 
AND PARSE_UTC_USEC(repository_created_at) BETWEEN PARSE_UTC_USEC('1996-01-01 00:00:00') AND PARSE_UTC_USEC('2015-05-30 00:00:00') 
GROUP BY repository_name
HAVING COUNT(*) < 100 
ORDER BY COUNT(*) DESC 
LIMIT 100

1 个答案:

答案 0 :(得分:3)

我认为此查询对您有用。您的现有查询将无法运行,因为ORDER BY子句引用了聚合计算。 ORDER BY要求表达式引用字段。将COUNT移动到SELECT子句会修复该部分。

此外,如果您正在查找git提交的计数,您应该通过将AND payload_commit IS NOT NULL添加到WHERE子句来检查时间轴事件是否为提交!

SELECT
  repository_name,
  COUNT(1) AS CommitCount
FROM
  [githubarchive:github.timeline]
WHERE
  repository_language = 'Java'
  AND PARSE_UTC_USEC(repository_created_at)
    BETWEEN PARSE_UTC_USEC('1996-01-01 00:00:00')
    AND PARSE_UTC_USEC('2015-05-30 00:00:00')
AND payload_commit IS NOT NULL
GROUP BY
  repository_name
HAVING
  CommitCount < 100
ORDER BY
  CommitCount DESC
LIMIT
  100