在Postgresql中,如何按列选择前n%的行?

时间:2019-03-02 10:01:41

标签: sql postgresql

Postgresql (版本10)中,在sql之后,按avg_grade选择所有行顺序。

-- query - students list, order by average grade,
select s.student_id, s.student_name, avg(ce.grade) as avg_grade
from students as s
       left join course_enrollment as ce on s.student_id = ce.student_id
group by s.student_id
order by avg_grade desc NULLS LAST;

相关表格

学生:

create table students (
  student_id   bigserial                           not null primary key,
  student_name varchar(200)                        not null,
  created      timestamp default CURRENT_TIMESTAMP not null
);

课程注册:

-- create table,
create table course_enrollment
(
  course_id  bigint                              not null,
  student_id bigint                              not null,
  grade      float                               not null,
  created    timestamp default CURRENT_TIMESTAMP not null,
  unique (course_id, student_id)
);

问题:

  • 如何仅检索avg_grade具有最高值的行的前n%(例如10%)?
    想知道是否有一个窗口函数可以执行此操作,还是需要子查询?

顺便说一句:

2 个答案:

答案 0 :(得分:2)

我将使用子查询:

select student_id, student_name, avg_grade, rank() over (order by avg_grade desc)
from (select s.student_id,
             s.student_name,
             avg(ce.grade)                                        as avg_grade,
             rank() over (order by avg(ce.grade) desc nulls last) as seqnum,
             count(*) over ()                                     as cnt
      from students s
             left join
           course_enrollment ce
           on s.student_id = ce.student_id
      group by s.student_id
     ) as ce_avg
where seqnum <= cnt * 0.1;

您可以改用其他窗口功能,例如NTILE()PERCENTILE_DISC()。我更喜欢直接计算,因为它可以更好地控制联系的处理方式。

答案 1 :(得分:0)

尝试了一段时间后,我自己得到了一个丑陋而有效的解决方案。

select *, rank() over (order by avg_grade desc)
from (
       select s.student_id, s.student_name, avg(ce.grade) as avg_grade
       from students as s
              left join course_enrollment as ce on s.student_id = ce.student_id
       group by s.student_id
       order by avg_grade desc nulls last
     ) as ce_avg
where avg_grade >= (
  select ce_avg.avg_grade
  from (
         select s.student_id, s.student_name, avg(ce.grade) as avg_grade
         from students as s
                left join course_enrollment as ce on s.student_id = ce.student_id
         group by s.student_id
         order by avg_grade desc nulls last
       ) as ce_avg
  limit 1 offset (select (count(*) * 0.1)::int from students) - 1
);

提示:

  • 无论如何都不能简单地使用(limit %n * total)或(top n percent)。 由于具有avg_grade =极小avg_grade的学生,可能仅被部分包括在内,这是不公平的。
    上面的难看的sql可以处理这种情况,但会降低性能。

    下面是一个示例,其中显示了处理或未处理重复项时运行结果的差异:

    • 处理重复-更公平。 Duplication handled

    • 未处理重复-不公平 Duplication unhandled