运行以下查询时,我遇到了Amazon Redshift聚合错误的问题:
select case when frequency between (avg(frequency) + stddev(frequency)) and (avg(frequency) - stddev(frequency)) then round(avg(frequency) - stddev(frequency))||'-'||round(avg(frequency) + stddev(frequency))
when frequency between (avg(frequency) + 2*stddev(frequency)) and (avg(frequency) - 2*stddev(frequency)) then round(avg(frequency) - 2*stddev(frequency))||'-'||round(avg(frequency) + 2*stddev(frequency))
when frequency between (avg(frequency) + 3*stddev(frequency)) and (avg(frequency) - 3*stddev(frequency)) then round(avg(frequency) - 3*stddev(frequency))||'-'||round(avg(frequency) + 3*stddev(frequency))
else null
end as deviation
from schema.table
错误告诉我,我需要在group by子句中包含频率。如果我这样做,那么我会收到“群组中不允许的聚合”。有谁知道为什么会这样吗?我最初的猜测是它可能是数据类型的问题,但是弄乱这个并没有帮助。
谢谢!
答案 0 :(得分:0)
这些查询可能会令人困惑,您可以在子查询中单独获取聚合,然后通过交叉连接在每一行上使用它们,或者您可以使用分析函数,这样您就可以获得聚合值而无需GROUP BY
:
SELECT case when frequency between (avg_Freq + dev_Freq) and (avg_Freq - dev_Freq) then round(avg_Freq - dev_Freq)||'-'||round(avg_Freq + dev_Freq)
when frequency between (avg_Freq + 2*dev_Freq) and (avg_Freq - 2*dev_Freq) then round(avg_Freq - 2*dev_Freq)||'-'||round(avg_Freq + 2*dev_Freq)
when frequency between (avg_Freq + 3*dev_Freq) and (avg_Freq - 3*dev_Freq) then round(avg_Freq - 3*dev_Freq)||'-'||round(avg_Freq + 3*dev_Freq)
else null
end as deviation
FROM schema.table
CROSS JOIN (SELECT avg(frequency) AS avg_Freq
,stddev(frequency) AS dev_Freq
FROM schema.table
)sub
或者,您可以将OVER()
添加到现有查询中的每个聚合中:
select case when frequency between (avg(frequency) OVER() + stddev(frequency) OVER()) and (avg(frequency) OVER() - stddev(frequency) OVER()) then round(avg(frequency) OVER() - stddev(frequency) OVER())||'-'||round(avg(frequency) OVER() + stddev(frequency) OVER())
when frequency between (avg(frequency) OVER() + 2*stddev(frequency) OVER()) and (avg(frequency) OVER() - 2*stddev(frequency) OVER()) then round(avg(frequency) OVER() - 2*stddev(frequency) OVER())||'-'||round(avg(frequency) OVER() + 2*stddev(frequency) OVER())
when frequency between (avg(frequency) OVER() + 3*stddev(frequency) OVER()) and (avg(frequency) OVER() - 3*stddev(frequency) OVER()) then round(avg(frequency) OVER() - 3*stddev(frequency) OVER())||'-'||round(avg(frequency) OVER() + 3*stddev(frequency) OVER())
else null
end as deviation
from schema.table
对于redshift语法不是100%,但相信两者都应该有用。
答案 1 :(得分:0)
您可以通过以下方式将其分解为:
WITH
SELECT avg(frequency) as AVG, stddev(frequency) as STDDEV
from schema.table AS TEMP
,
SELECT case when frequency between TEMP.AVG and TEMP.STDDEV etc.
您必须检查确切的陈述。我是靠头做的。