需要查找SQL中按ID分组的前3条记录的平均值

时间:2013-01-12 17:39:41

标签: sql postgresql greatest-n-per-group average

我有一个带有客户ID,日期和整数的postgres表。我需要找到每个客户ID的前三条记录的平均值,这些记录包含去年的日期。我可以使用下面的SQL使用单个ID(id是客户ID,周末是日期,maxattached是整数)。

一个警告:最大值是每月,这意味着我们只查看给定月份中的最高值来创建数据集,这就是我们从日期开始提取月份的原因。

SELECT 
  id,
  round(avg(max),0) 
FROM 
  (
   select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max 
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' AND 
     id=110070 group by id,month,year 
   ORDER BY
     max desc limit 3
   ) AS t 
GROUP BY id;

如何扩展此查询以包含所有ID和每个ID的单个平均数?

以下是一些示例数据:

ID     | MaxAttached | Weekending
110070 | 5           | 2011-11-10
110070 | 6           | 2011-11-17
110071 | 4           | 2011-11-10
110071 | 7           | 2011-11-17
110070 | 3           | 2011-12-01
110071 | 8           | 2011-12-01
110070 | 5           | 2012-01-01
110071 | 9           | 2012-01-01

因此,对于此示例表,我希望收到以下结果:

ID     | MaxAttached

110070 | 5           
110071 | 8

这是每个ID在给定月份的最高值(110070为6,3,5,110071为7,8,9)

注意:postgres版本8.1.15

2 个答案:

答案 0 :(得分:3)

首先 - 为每个客户和月份获取max(maxattached)

SELECT id,
       max(maxattached) as max_att         
FROM myTable 
WHERE weekending >= now() - interval '1 year' 
GROUP BY id, date_trunc('month',weekending);

接下来 - 为每个客户排名他所有的价值观:

SELECT id,
       max_att,
       row_number() OVER (PARTITION BY id ORDER BY max_att DESC) as max_att_rank
FROM <previous select here>;

接下来 - 为每位客户获得前三名:

SELECT id,
       max_att
FROM <previous select here>
WHERE max_att_rank <= 3;

下一步 - 获取每位客户的avg值:

SELECT id,
       avg(max_att) as avg_att
FROM <previous select here>
GROUP BY id;

接下来 - 只需将所有查询放在一起,然后根据您的情况重写/简化它们。

更新:这是一个包含测试数据和查询的SQLFiddle:SQLFiddle

UPDATE2:这是一个可以在8.1上运行的查询:

SELECT customer_id,
       (SELECT round(avg(max_att),0)
        FROM (SELECT max(maxattached) as max_att         
              FROM table1
              WHERE weekending >= now() - interval '2 year' 
                AND id = ct.customer_id
              GROUP BY date_trunc('month',weekending)
              ORDER BY max_att DESC
              LIMIT 3) sub 
        ) as avg_att
FROM customer_table ct;

这个想法 - 接受初始查询并为每个客户(customer_table - 为客户提供所有唯一id表)运行它。

以下是带有此查询的SQLFiddle:SQLFiddle

仅在8.3版本上测试过(8.1太旧了,无法在SQLFiddle上使用)。

答案 1 :(得分:0)

8.3版本

8.3是我访问过的最早的版本,所以我不能保证它在8.1中可以使用

我正在使用临时表来计算最好的三个记录。

CREATE TABLE temp_highest_per_month as
   select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max_in_month,
     0 as priority
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' 
   group by id,month,year;

UPDATE temp_highest_per_month t
SET priority = 
 (select count(*) from temp_highest_per_month t2
  where t2.id = t.id and 
   (t.max_in_month < t2.max_in_month or
     (t.max_in_month= t2.max_in_month and
      t.year * 12 + t.month > t2.year * 12 + t.month)));

select id,round(avg(max_in_month),0)
from temp_highest_per_month
where priority <= 3
group by id;

年&amp;月份包括在计算优先级中,以便如果两个月具有相同的最大值,它们仍将正确包含在编号中。

9.1版

与Igor的答案类似,但我使用With子句来分割步骤。

with highest_per_month as
  ( select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max_in_month
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' 
   group by id,month,year),
  prioritised as
  ( select id, month, year, max_in_month,
    row_number() over (partition by id, month, year
                       order by max_in_month desc)
    as priority
    from highest_per_month
   )
select id, round(avg(max_in_month),0)
from prioritised
where priority <= 3
group by id;