我有一个数据库如下:
| company | timestamp | value |
| ------- | ---------- | ----- |
| google | 2020-09-01 | 5 |
| google | 2020-08-01 | 4 |
| amazon | 2020-09-02 | 3 |
如果有 >= 20 个数据点,我想计算过去一年内每家公司的平均 value
。如果数据点少于 20 个,那么我想要整个时间段内的平均值。我知道我可以做两个单独的查询并获得每个场景的平均值。我想的问题是如何根据我的标准将它们合并回一个表中。
select company, avg(value) from my_db GROUP BY company;
select company, avg(value) from my_db
where timestamp > (CURRENT_DATE - INTERVAL '12 months')
GROUP BY company;
答案 0 :(得分:1)
使用条件聚合:
select company,
case
when sum(case when timestamp > CURRENT_DATE - INTERVAL '12 months' then value end) >= 20 then
avg(case when timestamp > CURRENT_DATE - INTERVAL '12 months' then value end)
else avg(value)
end
from my_db
group by company
如果用 20 个数据点表示每家公司在过去 12 个月中的 20 行,则:
select company,
case
when count(case when timestamp > CURRENT_DATE - INTERVAL '12 months' then value end) >= 20 then
avg(case when timestamp > CURRENT_DATE - INTERVAL '12 months' then value end)
else avg(value)
end
from my_db
group by company
答案 1 :(得分:1)
您可以使用窗口函数来提供过滤信息:
select company, avg(value),
(count(*) = cnt_this_year) as only_this_year
from (select t.*,
count(*) filter (where date_trunc('year', datecol) = date_trunc('year', now()) over (partition by company) as cnt_this_year
from t
) t
where cnt_this_year >= 20 and date_trunc('year', datecol) = date_trunc('year', now()) or
cnt_this_year < 20
group by company;
第三列指定是否所有行都来自今年。通过在 where
子句中进行过滤,还可以轻松添加其他计算(例如 min()
、max()
等)。
答案 2 :(得分:1)
WITH last_year AS (
SELECT company, avg(value), 'year' AS range -- optional tag
FROM tbl
WHERE timestamp >= now() - interval '1 year'
GROUP BY 1
HAVING count(*) >= 20 -- 20+ rows in range
)
SELECT company, avg(value), 'all' AS range
FROM tbl
WHERE NOT EXISTS (SELECT FROM last_year WHERE company = t.company)
GROUP BY 1
UNION ALL TABLE last_year;
db<>fiddle here
(timestamp)
上的索引仅在您的表很大且可以存放多年时使用。
如果大多数公司的范围内有 20 多行,则 (company)
上的索引将用于第二个 SELECT
以检索少数异常值。