我经常遇到同样的任务 - 通过分类变量中的前X值汇总数据,然后滚动其他所有内容"其他"。
到目前为止,我正在使用这个技巧:
SELECT
year,
if(tt.state is null, "other", t.state) as state_filtered,
count(1) as children
FROM [publicdata:samples.natality] as t
LEFT OUTER JOIN (
SELECT state, count(1) as children FROM [publicdata:samples.natality]
WHERE state is not null
GROUP BY state
ORDER BY children DESC
LIMIT 5
) as tt ON tt.state=t.state
GROUP BY year, state_filtered
ORDER BY year, state_filtered
但它不是很干净,因为我两次查询同一个表,而在现实生活中,代码变得太复杂了。我一直在寻找使用ROLLUP或TOP的解决方案,但没有找到更好的解决方案。
有人知道更好的方法吗?
答案 0 :(得分:3)
您可以在子查询中使用Row_Number。
SELECT
IF (RNB<=5, state, "Other") AS state,
SUM(children) AS Children
FROM (
SELECT
state,
children,
ROW_NUMBER() OVER (ORDER BY children DESC) AS RNB
FROM (
SELECT
state,
COUNT(1) AS children,
FROM
[publicdata:samples.natality]
WHERE
state IS NOT NULL
GROUP BY
state))
GROUP EACH BY
state
答案 1 :(得分:3)
我认为只需一个子选择就足够了
SELECT
year,
IF (pos <= 5, state, "other") AS state,
SUM(children) AS children
FROM (
SELECT
year,
state,
ROW_NUMBER() OVER (PARTITION BY year ORDER BY children DESC) AS pos,
COUNT(1) AS children,
FROM
[publicdata:samples.natality]
WHERE
state IS NOT NULL
GROUP BY
year, state
)
GROUP BY year, state
ORDER BY year, state
答案 2 :(得分:2)
我认为有一个捷径解决方案让你在全球拥有前5个州 没有连接 - 所以至少代码明智 - 它只进行一次扫描!与目前使用的原始代码相比,它快了两倍 不确定你是否愿意 - 取决于你的真实场景
SELECT
year,
state,
SUM(children) as children
FROM (
SELECT
state,
REGEXP_EXTRACT(year_info, r'^(\w+)') as year,
INTEGER(REGEXP_EXTRACT(year_info, r'(\w+)$')) as children,
FROM (
SELECT
CASE WHEN pos < 6 THEN state ELSE 'other' END state,
SPLIT(years_list) as year_info
FROM (
SELECT
state,
GROUP_CONCAT(STRING(year) + '|' + STRING(rows)) as years_list,
ROW_NUMBER() OVER(ORDER BY children DESC) as pos,
SUM(rows) as children
FROM (
SELECT year, state, COUNT(1) AS rows
FROM [publicdata:samples.natality]
WHERE state IS NOT NULL
GROUP BY year, state
)
GROUP BY state
)
)
)
GROUP BY year, state
ORDER BY year, state
我觉得有更好的方法来处理&#34; group_concat / split&#34;特技