我有一个带有客户ID,日期和整数的postgres表。我需要找到每个客户ID的前三条记录的平均值,这些记录包含去年的日期。我可以使用下面的SQL使用单个ID(id是客户ID,周末是日期,maxattached是整数)。
一个警告:最大值是每月,这意味着我们只查看给定月份中的最高值来创建数据集,这就是我们从日期开始提取月份的原因。
SELECT
id,
round(avg(max),0)
FROM
(
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max
FROM
myTable
WHERE
weekending >= now() - interval '1 year' AND
id=110070 group by id,month,year
ORDER BY
max desc limit 3
) AS t
GROUP BY id;
如何扩展此查询以包含所有ID和每个ID的单个平均数?
以下是一些示例数据:
ID | MaxAttached | Weekending
110070 | 5 | 2011-11-10
110070 | 6 | 2011-11-17
110071 | 4 | 2011-11-10
110071 | 7 | 2011-11-17
110070 | 3 | 2011-12-01
110071 | 8 | 2011-12-01
110070 | 5 | 2012-01-01
110071 | 9 | 2012-01-01
因此,对于此示例表,我希望收到以下结果:
ID | MaxAttached
110070 | 5
110071 | 8
这是每个ID在给定月份的最高值(110070为6,3,5,110071为7,8,9)
注意:postgres版本8.1.15
答案 0 :(得分:3)
首先 - 为每个客户和月份获取max(maxattached)
:
SELECT id,
max(maxattached) as max_att
FROM myTable
WHERE weekending >= now() - interval '1 year'
GROUP BY id, date_trunc('month',weekending);
接下来 - 为每个客户排名他所有的价值观:
SELECT id,
max_att,
row_number() OVER (PARTITION BY id ORDER BY max_att DESC) as max_att_rank
FROM <previous select here>;
接下来 - 为每位客户获得前三名:
SELECT id,
max_att
FROM <previous select here>
WHERE max_att_rank <= 3;
下一步 - 获取每位客户的avg
值:
SELECT id,
avg(max_att) as avg_att
FROM <previous select here>
GROUP BY id;
接下来 - 只需将所有查询放在一起,然后根据您的情况重写/简化它们。
更新:这是一个包含测试数据和查询的SQLFiddle:SQLFiddle。
UPDATE2:这是一个可以在8.1上运行的查询:
SELECT customer_id,
(SELECT round(avg(max_att),0)
FROM (SELECT max(maxattached) as max_att
FROM table1
WHERE weekending >= now() - interval '2 year'
AND id = ct.customer_id
GROUP BY date_trunc('month',weekending)
ORDER BY max_att DESC
LIMIT 3) sub
) as avg_att
FROM customer_table ct;
这个想法 - 接受初始查询并为每个客户(customer_table
- 为客户提供所有唯一id
表)运行它。
以下是带有此查询的SQLFiddle:SQLFiddle。
仅在8.3版本上测试过(8.1太旧了,无法在SQLFiddle上使用)。
答案 1 :(得分:0)
8.3版本
8.3是我访问过的最早的版本,所以我不能保证它在8.1中可以使用
我正在使用临时表来计算最好的三个记录。
CREATE TABLE temp_highest_per_month as
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month,
0 as priority
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year;
UPDATE temp_highest_per_month t
SET priority =
(select count(*) from temp_highest_per_month t2
where t2.id = t.id and
(t.max_in_month < t2.max_in_month or
(t.max_in_month= t2.max_in_month and
t.year * 12 + t.month > t2.year * 12 + t.month)));
select id,round(avg(max_in_month),0)
from temp_highest_per_month
where priority <= 3
group by id;
年&amp;月份包括在计算优先级中,以便如果两个月具有相同的最大值,它们仍将正确包含在编号中。
9.1版
与Igor的答案类似,但我使用With子句来分割步骤。
with highest_per_month as
( select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year),
prioritised as
( select id, month, year, max_in_month,
row_number() over (partition by id, month, year
order by max_in_month desc)
as priority
from highest_per_month
)
select id, round(avg(max_in_month),0)
from prioritised
where priority <= 3
group by id;