我正在玩bigquery并遇到问题,我的查询是:
SELECT * FROM (
SELECT a.title, a.counter , MAX(b.num_characters) as max
FROM (
SELECT title, count(*) as counter FROM publicdata:samples.wikipedia
GROUP EACH BY title
ORDER BY counter DESC
LIMIT 10
) a JOIN
(SELECT title,num_characters FROM publicdata:samples.wikipedia
) b ON a.title = b.title
GROUP BY a.title, a.counter)
LIMIT 1;
虽然这是有效的,但我的回复太大而无法返回。第一个Subquery运行正常,我想要做的是获得更多的列。但我失败了。
答案 0 :(得分:2)
不要担心“限制1”,在达到该阶段之前响应会变得太大。
尝试跳过第二个子查询,因为它只从大数据集中选择2列,而不对其进行过滤。一个可行的替代方案是:
SELECT
a.title, a.counter, MAX(b.num_characters) AS max
FROM
publicdata:samples.wikipedia b JOIN(
SELECT
title, COUNT(*) AS counter
FROM
publicdata:samples.wikipedia
GROUP EACH BY title
ORDER BY
counter DESC
LIMIT 10) a
ON a.title = b.title
GROUP BY
a.title,
a.counter
这在15.4秒内运行。
我们可以使用TOP()来快速完成:
SELECT
a.title title, counter, MAX(num_characters) max
FROM
publicdata:samples.wikipedia b
JOIN
(
SELECT
TOP(title, 10) AS title, COUNT(*) AS counter
FROM
publicdata:samples.wikipedia
) a
ON a.title=b.title
GROUP BY
title, counter
TOP()的作用更简单,更快(SELECT COUNT(*)/ GROUP / LIMIT)。
https://developers.google.com/bigquery/docs/query-reference#top-function
现在它仅运行6.5秒,处理15.9 GB。