大查询"配额超出"用于Git-Hub pushEvent数据集的SQL

时间:2017-11-22 00:17:44

标签: sql github google-bigquery bigdata

我对Google BigQuery很陌生,只对SQL很轻松,我很想知道你们是否可以帮助我重新格式化我的SQL语句,以减少我的使用量?因为我目前的设置遇到了这个错误:

  

错误:超出配额:您的项目超出了免费查询的配额   字节扫描。有关更多信息,请参阅   https://cloud.google.com/bigquery/troubleshooting-errors

我的查询如下:

SELECT
  LANGUAGE AS name,
  year,
  quarter,
  count
FROM (
  SELECT
    *
  FROM (
    SELECT
      lang AS language,
      y AS year,
      q AS quarter,
      type,
      COUNT(*) AS count
    FROM (
      SELECT
        a.type type,
        b.lang lang,
        a.y y,
        a.q q
      FROM (
        SELECT
          type,
          YEAR(created_at) AS y,
          QUARTER(created_at) AS q,
          STRING(REGEXP_REPLACE(repo.url, r'(https:\/\/api\.github\.com\/repos\/)', '')) AS name
        FROM
          [githubarchive:year.2016] ) a
      JOIN (
        SELECT
          repo_name AS name,
          lang
        FROM (
          SELECT
            *
          FROM (
            SELECT
              *,
              ROW_NUMBER() OVER (PARTITION BY repo_name ORDER BY lang) AS num
            FROM (
              SELECT
                repo_name,
                FIRST_VALUE(language.name) OVER (PARTITION BY repo_name ORDER BY language.bytes DESC) AS lang
              FROM
                [bigquery-public-data:github_repos.languages]))
          WHERE
            num = 1
          ORDER BY
            repo_name)
        WHERE
          lang != 'null') b
      ON
        a.name = b.name)
    GROUP BY
      type,
      language,
      year,
      quarter
    ORDER BY
      year,
      quarter,
      count DESC)
  WHERE
    count >= 1000)
WHERE
  type = 'PushEvent'
LIMIT
  100

基本上我试图按照"推送"来构建所有前100种语言的数据集。在Git-Hub上使用D3来显示所述数据。到目前为止,我使用的数据非常少,但是这一个查询目前是20gb,应该低于限制。

作为一名学生,我怀疑我是否有能力支付这笔费用。

1 个答案:

答案 0 :(得分:3)

有问题的查询只扫描了22.5GB,约为0.11美元 错误是说您超过了free tier允许的字节数 - 即1TB 因此,您可以在下个月需要等待的月份内运行查询约45次

我建议你不要每次都运行这个查询 - 而是保存结果并在实验/尝试中使用它,所以你不要快速浪费你的1TB!