我有一个数据集,其中包含考试列表,与考试相关的资格/单元以及考试是否通过。数据看起来像这样:
candidate | qualification | unit | exam | exam_status
-----------------------------------------------------
C1 | Q1 | U1 | E1 | Passed
C1 | Q1 | U2 | E2 | NULL
C1 | Q1 | U2 | E3 | Passed
C1 | Q1 | U3 | E4 | Passed
C1 | Q1 | U3 | E5 | Passed
由此,我需要能够计算出每种资格证书所存在的单元总数,以及候选人已经通过了多少单元。
理论上,每个单元应进行一次考试(尽管如果考生第一次考试不及格,则可能会有多条记录),因此我应该能够使用以下查询获取所需的数据:
select
candidate,
qualification,
count(distinct unit),
count(
case when exam_status = 'Passed' then 1 else null end
)
from example_table
group by candidate, qualification
但是,由于某种原因,一些应试者多次通过了同一考试,这意味着我通过的单元数有时超过了单元总数。
我想做类似的事情:
count(distinct exam case when exam_status = 'Passed' then 1 else null end)
仅选择已通过但失败的独特考试。
有人知道我能做到这一点吗?预先感谢。
答案 0 :(得分:2)
您需要独特的考试数量,所以我认为是:
select candidate, qualification,
count(distinct units) as total_units,
count(distinct case when exam_status = 'Passed' then exam end)
from example_table
group by candidate, qualification;
如果您想对通过考试的单位求和,这将变得更加棘手。我会推荐窗口功能:
select candidate, qualification, count(distinct unit),
sum(case when exam_status = 'Passed' and seqnum = 1 then unit end) as total_units,
count(distinct case when exam_status = 'Passed' then exam end)
from (select et.*,
row_number() over (partition by candidate, qualification, exam
order by (case when exam_status = 'Passed' then 1 else 2 end)
) as seqnum
from example_table et
) et
where seqnum = 1
group by candidate, qualification;