试图找到列中的第二大值(postgresql)

时间:2011-02-06 01:33:08

标签: sql postgresql greatest-n-per-group

我试图找到列中的第二大值,而只是第二大值。

select a.name, max(a.word) as word
from apple a
where a.word < (select max(a.word) from apple a)
group by a.name;

出于某种原因,我现在拥有的第二个最大值和所有较低的值也是如此,但幸运的是避免了最大值。

有没有办法解决这个问题?

7 个答案:

答案 0 :(得分:14)

根据EXPLAIN ANALYZE的说法,这是另一个概念上简单的解决方案,它在2.1百万行的表中以0.1毫秒的速度运行。在只有一个值的情况下,它不返回任何内容。

SELECT a.name, 
(SELECT word FROM apple ap WHERE ap.name=a.name ORDER BY word ASC OFFSET 1 LIMIT 1) 
FROM apple a

请注意,我的表已经有名称,单词和(名称,单词)的现有索引,这允许我像这样使用ORDER BY。

答案 1 :(得分:5)

最简单,尽管效率低下(阵列可以耗尽内存):

select student, (array_agg(grade order by grade desc))[2]
from 
student_grades
group by student

高效的:

create aggregate two_elements(anyelement)
(
sfunc = array_limit_two,
stype = anyarray,
initcond = '{}'
);

create or replace function array_limit_two(anyarray, anyelement) returns anyarray
as 
$$
begin
    if array_upper($1,1) = 2 then
        return $1;
    else
        return array_append($1, $2);
    end if;
end;
$$ language 'plpgsql';

测试数据:

create table student_grades
(
student text,
grade int
);



insert into student_grades values 
('john',70),
('john',80),
('john',90),
('john',100);


insert into student_grades values
('paul',20),
('paul',10),
('paul',50),
('paul',30);


insert into student_grades values
('george',40);

测试代码:

-- second largest
select student, coalesce( (two_elements(grade order by grade desc))[2], max(grade) /* min would do too, since it's one element only */ )
from 
student_grades
group by student


-- second smallest
select student, coalesce( (two_elements(grade order by grade))[2], max(grade) /* min would do too, since it's one element only */ )
from 
student_grades
group by student

输出:

q_and_a=# -- second largest
q_and_a=# select student, coalesce( (two_elements(grade order by grade desc))[2], max(grade) /* min would do too, since it's one element only */ )
q_and_a-# from
q_and_a-# student_grades
q_and_a-# group by student;
 student | coalesce
---------+----------
 george  |       40
 john    |       90
 paul    |       30
(3 rows)


q_and_a=#
q_and_a=# -- second smallest
q_and_a=# select student, coalesce( (two_elements(grade order by grade))[2], max(grade) /* min would do too, since it's one element only */ )
q_and_a-# from
q_and_a-# student_grades
q_and_a-# group by student;
 student | coalesce
---------+----------
 george  |       40
 john    |       80
 paul    |       20
(3 rows)

修改 @diesel最简单(也有效率):

-- second largest
select student, array_min(two_elements(grade order by grade desc))
from 
student_grades
group by student;

-- second smallest
select student, array_max(two_elements(grade order by grade))
from 
student_grades
group by student;

array_max函数:

create or replace function array_min(anyarray) returns anyelement
as
$$
select min(unnested) from( select unnest($1) unnested ) as x
$$ language sql;

create or replace function array_max(anyarray) returns anyelement
as
$$
select max(unnested) from( select unnest($1) unnested ) as x
$$ language sql;

修改

可能是最简单有效的,如果只有Postgresql会使array_max成为内置函数并促进聚合上的LIMIT子句:-)聚合上的LIMIT子句是我在Postgresql上的梦想功能

select student, array_max( array_agg(grade order by grade limit 2) )
from 
student_grades
group by student;

虽然聚合的LIMIT尚不可用,但请使用:

-- second largest
select student, 

    array_min
    (

        array ( 
               select grade from student_grades 
               where student = x.student order by grade desc limit 2 )

    )

from 
student_grades x
group by student;


-- second smallest
select student, 

    array_max
    (

        array ( 
               select grade from student_grades 
               where student = x.student order by grade limit 2 )

    )

from 
student_grades x
group by student;

答案 2 :(得分:3)

这也是一种蛮力,但保证只能完全传递一次表:

select name,word
  from (
         select name,word
              , row_number() over (partition by name 
                                       order by word desc)
                as rowNum
           from apple
       ) x
 where rowNum = 2

如果您对(名称,单词)有覆盖索引并且每个名称的字值计数很高,则此版本可能会表现得更好:

with recursive myCte as
(
 select name,max(word) as word
      , 1 as rowNum
   from apple
  group by name
  union all
 select par.name
      , (select max(word) as word
           from apple 
          where name = par.name
            AND word < par.word
        ) as word
      , 2 as rowNum
   from myCte par
  where par.rowNum = 1
)
select * from myCte where rownum = 2

答案 3 :(得分:1)

SELECT *
FROM (
  SELEC name, 
        dense_rank() over (partition by name order by word desc) as word_rank,
        count(*) over (partition by name) as name_count
  FROM apple
) t
WHERE (word_rank = 2 OR name_count = 1)

修改
name_count = 1负责处理特定名称只有一行的情况。

使用dense_rank()代替rank(),确保 一行,word_rank = 2,因为dense_rank确保没有间隙

答案 4 :(得分:0)

非常强力的查询,但它有效

select a.name, a.word
from apple a
where (select count(distinct b.word) from apple b
    where b.word > a.word) = 1

答案 5 :(得分:0)

另一种方法,使用RANK:

with ranking as
(
    select student, grade, rank() over(partition by student order by grade desc) as place
    from 
    student_grades
)
select * 
from
ranking
where 
    (student, place) 
    in 
    (
        select student, max(place)
        from ranking
        where place <= 2
        group by student
    )

MIN的第二个:

with ranking as
(
    select student, grade, 
        rank() 
        -- just change DESC to ASC
        over(partition by student order by grade ASC ) as place
    from 
    student_grades
)
select * 
from
ranking
where 
    (student, place) 
    in 
    (                
        select student, max(place) -- still max
        from ranking
        where place <= 2
        group by student
    )

答案 6 :(得分:0)

嗯,你不仅仅意味着:

select a.name, max(a.word) as word
from apple a
where a.word < (select max(b.word) from apple b WHERE a.name = b.name)
group by a.name;
你呢?每个名称一行返回每个名称的第二个最高值(如果没有第二个最高值,则没有行)。

如果这是你想要的,你的查询只是缺少一个约束,虽然我怀疑上面可能是两个表扫描,如果PostgreSQL有意将它转换为JOIN。