我是Google大查询的新手,我想访问Github API,并且我有此代码,
query_job = client.query("""
SELECT
actor.login AS actor_login,
COUNT(1) AS events_actor_count
FROM
`githubarchive:year.2017` as gb17,
`githubarchive:year.2016` as gb16,
`githubarchive:year.2015` as gb15,
`githubarchive:year.2014` as gb14,
`githubarchive:year.2013` as gb13,
`githubarchive:year.2012` as gb12,
`githubarchive:year.2011` as gb11
WHERE
type = 'CommitCommentEvent'
OR type = 'PushEvent'
OR type = 'IssueCommentEvent'
OR type = 'PullRequestEvent'
OR type = 'PullRequestReviewCommentEvent'
OR type = 'IssuesEvent'
GROUP BY
actor_login
ORDER BY
events_actor_count DESC
""")
results = query_job.result()
我收到此错误:
---------------------------------------------------------------------------
BadRequest Traceback (most recent call last)
<ipython-input-29-9c0a41bed3c6> in <module>()
27 """)
28
---> 29 results = query_job.result()
/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout, retry)
2735 not complete in the given timeout.
2736 """
-> 2737 super(QueryJob, self).result(timeout=timeout)
2738 # Return an iterator instead of returning the job.
2739 if not self._query_results:
/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
697 self._begin()
698 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 699 return super(_AsyncJob, self).result(timeout=timeout)
700
701 def cancelled(self):
/anaconda3/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
123 # pylint: disable=raising-bad-type
124 # Pylint doesn't recognize that this is valid in this case.
--> 125 raise self._exception
126
127 return self._result
BadRequest: 400 Column name type is ambiguous at [16:3]
我认为我的错误出在SELECT语句中,我必须附加表名吗?但是,当我有多个表时该怎么办?但是我的怀疑也可能是错误的,所以我将不胜感激。谢谢。
答案 0 :(得分:1)
使用通配符从所有所需年份中进行选择,而尝试以下操作:
SELECT
actor.login AS actor_login,
COUNT(1) AS events_actor_count
FROM `githubarchive:year.20*` as gh
WHERE
_TABLE_SUFFIX BETWEEN '11' AND '18' AND
type IN (
'CommitCommentEvent',
'PushEvent',
'IssueCommentEvent',
'PullRequestEvent',
'PullRequestReviewCommentEvent',
'IssuesEvent'
)
GROUP BY
actor_login
ORDER BY
events_actor_count DESC
我还使用了一个IN列表来简化过滤器。
答案 1 :(得分:0)
看起来您正在执行CROSS JOIN(BigQuery标准SQL中的逗号会交叉连接)而不是UNION ALL,因此,您对列type
的引用是不明确的
因此,请尝试在select语句中使用显式UNION ALL
答案 2 :(得分:0)
您可以使用通配符或,也可以使用_TABLE_SUFFIX
属性来进一步减少查询中扫描的字节数(因为通配符策略将扫描所有内容)。它还可以让您过滤某些年份。
是这样的:
select
actor.login AS actor_login,
COUNT(1) AS events_actor_count
from `githubarchive.year.*`
WHERE
type = 'CommitCommentEvent'
OR type = 'PushEvent'
OR type = 'IssueCommentEvent'
OR type = 'PullRequestEvent'
OR type = 'PullRequestReviewCommentEvent'
OR type = 'IssuesEvent'
AND (_TABLE_SUFFIX in ('2011', '2012', '2013', '2014', '2015', '2016', '2017'))
group by actor.login
order by events_actor_count