我正在创建一个查询,该查询获取2列的数量和另一列的总和,按月从最近13个月的日期列开始分组。这是我的问题:
SELECT TO_CHAR(colDate,'yyyy_MM') as month ,
COUNT(DISTINCT col1) AS col1,
COUNT(DISTINCT col2) as col2,
SUM(col3) as col3
FROM myTable
WHERE TO_CHAR(colDate,'yyyy_MM') IN (select distinct TO_CHAR(colDate,'yyyy_MM')
from myTable
order by 1 desc
limit 13)
GROUP BY 1
问题在于,每个月,我还需要前3个月的平均值:
COUNT(DISTINCT col1)AS col1, COUNT(DISTINCT col2)为col2, SUM(col3)为col3
所以我的查询需要像:
SELECT TO_CHAR(colDate,'yyyy_MM') as month ,
COUNT(DISTINCT col1) AS col1,
COUNT(DISTINCT col2) as col2,
SUM(col3) as col3,
... as PreviousMonthsAvgCol1,
... as PreviousMonthsAvgCol2,
... as PreviousMonthsAvgCol3
FROM myTable
WHERE TO_CHAR(colDate,'yyyy_MM') IN (select distinct TO_CHAR(colDate,'yyyy_MM')
from myTable
order by 1 desc
limit 13)
GROUP BY 1
第一个月之前的几个月仍然需要计算在第一个月的平均值。
答案 0 :(得分:1)
如果您在13个月之前不需要数据,请使用lag()
:
SELECT . . .,
LAG(COUNT(DISTINCT col1)) OVER (ORDER BY MIN(colDate)) as prev_col1,
. . .
FROM myTable . . .;
如果您确实需要早期数据,请执行完整聚合,然后选择13个月。
答案 1 :(得分:0)
同意Gordon Lindoff的回答。
但是,我建议不要在日期范围谓词中使用TO_CHAR()
。这将迫使Redshift扫描超出必要的数据。
如果必须将日期四舍五入到整个月,请尝试使用colDate BETWEEN '2017-01-01' and '2018-01-31'
或DATE_TRUNC()
。