BadRequest:400列名称类型在[16:3]

时间:2018-12-02 01:33:58

标签: python google-bigquery

我是Google大查询的新手,我想访问Github API,并且我有此代码,

query_job = client.query("""

SELECT
  actor.login AS actor_login,
  COUNT(1) AS events_actor_count
FROM
`githubarchive:year.2017` as gb17, 
`githubarchive:year.2016` as gb16, 
`githubarchive:year.2015` as gb15, 
`githubarchive:year.2014` as gb14, 
`githubarchive:year.2013` as gb13,
`githubarchive:year.2012` as gb12,
`githubarchive:year.2011` as gb11 

WHERE
  type = 'CommitCommentEvent'
    OR type = 'PushEvent'
    OR type = 'IssueCommentEvent'
    OR type = 'PullRequestEvent'
    OR type = 'PullRequestReviewCommentEvent'
    OR type = 'IssuesEvent'
GROUP BY
  actor_login
ORDER BY
  events_actor_count DESC

  """)

results = query_job.result()

我收到此错误:

---------------------------------------------------------------------------
BadRequest                                Traceback (most recent call last)
<ipython-input-29-9c0a41bed3c6> in <module>()
     27   """)
     28 
---> 29 results = query_job.result()

/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout, retry)
   2735             not complete in the given timeout.
   2736         """
-> 2737         super(QueryJob, self).result(timeout=timeout)
   2738         # Return an iterator instead of returning the job.
   2739         if not self._query_results:

/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
    697             self._begin()
    698         # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 699         return super(_AsyncJob, self).result(timeout=timeout)
    700 
    701     def cancelled(self):

/anaconda3/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
    123             # pylint: disable=raising-bad-type
    124             # Pylint doesn't recognize that this is valid in this case.
--> 125             raise self._exception
    126 
    127         return self._result

BadRequest: 400 Column name type is ambiguous at [16:3]

我认为我的错误出在SELECT语句中,我必须附加表名吗?但是,当我有多个表时该怎么办?但是我的怀疑也可能是错误的,所以我将不胜感激。谢谢。

3 个答案:

答案 0 :(得分:1)

使用通配符从所有所需年份中进行选择,而尝试以下操作:

SELECT
  actor.login AS actor_login,
  COUNT(1) AS events_actor_count
FROM `githubarchive:year.20*` as gh
WHERE
   _TABLE_SUFFIX BETWEEN '11' AND '18' AND
   type IN (
     'CommitCommentEvent',
     'PushEvent',
     'IssueCommentEvent',
     'PullRequestEvent',
     'PullRequestReviewCommentEvent',
     'IssuesEvent'
   )
GROUP BY
  actor_login
ORDER BY
  events_actor_count DESC

我还使用了一个IN列表来简化过滤器。

答案 1 :(得分:0)

看起来您正在执行CROSS JOIN(BigQuery标准SQL中的逗号会交叉连接)而不是UNION ALL,因此,您对列type的引用是不明确的

因此,请尝试在select语句中使用显式UNION ALL

答案 2 :(得分:0)

您可以使用通配符,也可以使用_TABLE_SUFFIX属性来进一步减少查询中扫描的字节数(因为通配符策略将扫描所有内容)。它还可以让您过滤某些年份。

是这样的:

select 
  actor.login AS actor_login,
  COUNT(1) AS events_actor_count 
from `githubarchive.year.*` 
WHERE
  type = 'CommitCommentEvent'
    OR type = 'PushEvent'
    OR type = 'IssueCommentEvent'
    OR type = 'PullRequestEvent'
    OR type = 'PullRequestReviewCommentEvent'
    OR type = 'IssuesEvent'
AND (_TABLE_SUFFIX in ('2011', '2012', '2013', '2014', '2015', '2016', '2017'))
group by actor.login
order by events_actor_count