查询以查找每个组的第二大值

时间:2015-05-07 11:31:28

标签: sql postgresql postgresql-9.3

我有三张桌子:

  1. project: project_id, project_name
  2. milestone: milestone_id, milestone_name
  3. project_milestone: id, project_id, milestone_id, completed_date
  4. 我希望从project_milestone按照project_milestone获得第二高的completed_date和milestone_id。那就是我想获得每个项目第二高的completed_date里程碑数。对此有什么正确的查询?

3 个答案:

答案 0 :(得分:7)

我认为您可以使用project_milestone表和row_number()执行您想要的操作:

select pm.*
from (select pm.*,
             row_number() over (partition by project_id order by completed_date desc) as seqnum
      from project_milestone pm
      where pm.completed_date is not null
     ) pm
where seqnum = 2;

如果您需要包含所有项目,即使是那些没有两个里程碑的项目,您也可以使用left join

select p.project_id, pm.milestone_id, pm.completed_date
from projects p left join
     (select pm.*,
             row_number() over (partition by project_id order by completed_date desc) as seqnum
      from project_milestone pm
      where pm.completed_date is not null
     ) pm
     on p.project_id = pm.project_id and pm.seqnum = 2;

答案 1 :(得分:0)

使用LATERAL(PG 9.3+)可以产生比窗口函数版本更好的性能。

SELECT * FROM project;
 project_id | project_name 
------------+--------------
          1 | Project A
          2 | Project B

SELECT * FROM project_milestone;
 id | project_id | milestone_id |     completed_date     
----+------------+--------------+------------------------
  1 |          1 |            1 | 2000-01-01 00:00:00+01
  2 |          1 |            2 | 2000-01-02 00:00:00+01
  3 |          1 |            5 | 2000-01-03 00:00:00+01
  4 |          1 |            6 | 2000-01-04 00:00:00+01
  5 |          2 |            3 | 2000-02-01 00:00:00+01
  6 |          2 |            4 | 2000-02-02 00:00:00+01
  7 |          2 |            7 | 2000-02-03 00:00:00+01
  8 |          2 |            8 | 2000-02-04 00:00:00+01


SELECT *
FROM project p
CROSS JOIN LATERAL (
    SELECT milestone_id, completed_date
    FROM project_milestone pm
    WHERE pm.project_id = p.project_id
    ORDER BY completed_date ASC
    LIMIT 1
    OFFSET 1
) second_highest;
 project_id | project_name | milestone_id |     completed_date     
------------+--------------+--------------+------------------------
          1 | Project A    |            2 | 2000-01-02 00:00:00+01
          2 | Project B    |            4 | 2000-02-02 00:00:00+01

答案 2 :(得分:0)

最简单的方法是使用window函数。

SELECT *, nth_value(completed_date,2)
OVER (
    PARTITION BY project_id ORDER BY completed_date DESC
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
AS date2
FROM project_milestone;