响应太大而无法返回LIMIT 1;

时间:2013-05-02 10:20:33

标签: google-bigquery

我正在玩bigquery并遇到问题,我的查询是:

SELECT * FROM (
SELECT a.title,  a.counter , MAX(b.num_characters) as max
FROM (
  SELECT title, count(*) as counter FROM publicdata:samples.wikipedia
  GROUP EACH BY title
  ORDER BY counter DESC
  LIMIT 10
) a JOIN
(SELECT title,num_characters FROM publicdata:samples.wikipedia
) b ON a.title = b.title
GROUP BY a.title, a.counter)
LIMIT 1;

虽然这是有效的,但我的回复太大而无法返回。第一个Subquery运行正常,我想要做的是获得更多的列。但我失败了。

1 个答案:

答案 0 :(得分:2)

不要担心“限制1”,在达到该阶段之前响应会变得太大。

尝试跳过第二个子查询,因为它只从大数据集中选择2列,而不对其进行过滤。一个可行的替代方案是:

SELECT
  a.title, a.counter, MAX(b.num_characters) AS max
FROM
  publicdata:samples.wikipedia b JOIN(
  SELECT
    title, COUNT(*) AS counter
  FROM
    publicdata:samples.wikipedia
    GROUP EACH BY title
  ORDER BY
    counter DESC
  LIMIT 10) a
  ON a.title = b.title
GROUP BY
  a.title,
  a.counter

这在15.4秒内运行。

我们可以使用TOP()来快速完成:

SELECT
  a.title title, counter, MAX(num_characters) max
FROM
  publicdata:samples.wikipedia b
JOIN
  (
  SELECT
    TOP(title, 10) AS title, COUNT(*) AS counter
  FROM
    publicdata:samples.wikipedia
    ) a
  ON a.title=b.title
GROUP BY
  title, counter

TOP()的作用更简单,更快(SELECT COUNT(*)/ GROUP / LIMIT)。

https://developers.google.com/bigquery/docs/query-reference#top-function

现在它仅运行6.5秒,处理15.9 GB。