我有一个3列的表格。我需要生成基于特定“诊断”执行的表的值(在“测试”列中)的组合(总是以3表示)。但是,一个特定的诊断有可能具有2个或更少的测试,在这种情况下,即使有2个值,逻辑仍将输出该组合。 参考下表,对于每个cust_id,都有一个“诊断”列,基于该列执行“测试”。现在,对于每个诊断值组,我需要在“测试”列中生成相应值的唯一组合。请注意,组合应始终具有3个值(其中值> = 3),但是对于诊断而言,小于3个值(1或2)的情况下,仍应输出相应的组合(具有可用的1或2个值并替代空值代替不可用的值。
患者:
pat_id | diagnosis | tests
1001 | Thyroid | CAT
1001 | Thyroid | MRI
1001 | Thyroid | Blood
1001 | Tonsil | CAT
1001 | Tonsil | MRI
1001 | Tonsil | Blood
1001 | Tonsil | RAPID
1002 | Pneumonia | MRI
1002 | Pneumonia | Eliza
1003 | Bronchitis | X-Ray
因此,对于pat_id = '1001'
和diagnosis = 'Thyroid'
,我们看到“测试”具有3个不同的值。因此,只能有1个唯一的组合,即{CAT, MRI, Blood}
。同样,对于pat_id = '1001' and diagnosis = 'Tonsil'
,我们在“测试”列中看到4个不同的值。因此,将有4个组合,即{CAT, MRI, Blood}, {CAT, MRI, RAPID}, {MRI, Blood, RAPID} & {CAT, blood, RAPID}
。对于pat_id = '1002'
,仅存在两个唯一值。因此,组合将仅为1,即{MRI, Eliza}
。与此类似,pat_id = '1003'
仅具有1个值,即X射线,因此对于{X-Ray}
,输出应为'1003'
。
像这样,我需要为组中的所有诊断值生成相似的组合,最后,确定该表中出现最大次数的唯一组合。输出应为表中出现次数最多的组合。
到目前为止,下面的sql返回具有3个或更多值的所有组合。但是,它无法输出小于3的值。意思是,1002 & 1003
的值小于3,因此没有输出,但是需要输出。解决方案还需要处理。
select p1.pat_id, p1.diagnosis, p1.tests, p2.tests, p3.tests
from patient p1 join
patient p2
on p1.pat_id = p2.pat_id and p1.diagnosis = p2.diagnosis and
p1.tests < p2.tests join
patient p3
on p2.pat_id = p3.pat_id and p2.diagnosis = p3.diagnosis and
p2.tests < p3.tests ;
还请说明如何确定哪种组合最多。 谢谢。
答案 0 :(得分:1)
您可以将查询与左联接一起使用,以允许第二个和第三个测试为NULL。但是随后您将需要删除具有两个或多个测试的组的带有NULL的行。您可以使用相关的(相关的)COUNT(*)
子查询来实现:
select
p1.pat_id,
p1.diagnosis,
p1.tests as test1,
p2.tests as test2,
p3.tests as test3
from patient p1
left join patient p2
on p2.diagnosis = p1.diagnosis
and p2.pat_id = p1.pat_id
and p2.tests > p1.tests
left join patient p3
on p3.diagnosis = p1.diagnosis
and p3.pat_id = p1.pat_id
and p3.tests > p2.tests
where
case (
select count(*)
from patient p
where p.diagnosis = p1.diagnosis
and p.pat_id = p1.pat_id
)
when 1 then true
when 2 then p2.tests is not null
else p3.tests is not null
end
order by p1.pat_id, p1.diagnosis
结果:
| pat_id | diagnosis | test1 | test2 | test3 |
| ------ | ---------- | ----- | ----- | ----- |
| 1001 | Thyroid | Blood | CAT | MRI |
| 1001 | Tonsil | CAT | MRI | RAPID |
| 1001 | Tonsil | Blood | MRI | RAPID |
| 1001 | Tonsil | Blood | CAT | MRI |
| 1001 | Tonsil | Blood | CAT | RAPID |
| 1002 | Pneumonia | Eliza | MRI | |
| 1003 | Bronchitis | X-Ray | | |
要按组合的出现顺序对不同的组合进行排序,只需将其修改为GROUP BY ... ORDER BY COUNT(*)
查询:
select
p1.tests as test1,
p2.tests as test2,
p3.tests as test3,
count(*) as cnt
from patient p1
left join patient p2
on p2.diagnosis = p1.diagnosis
and p2.pat_id = p1.pat_id
and p2.tests > p1.tests
left join patient p3
on p3.diagnosis = p1.diagnosis
and p3.pat_id = p1.pat_id
and p3.tests > p2.tests
where
case (
select count(*)
from patient p
where p.diagnosis = p1.diagnosis
and p.pat_id = p1.pat_id
)
when 1 then true
when 2 then p2.tests is not null
else p3.tests is not null
end
group by p1.tests, p2.tests, p3.tests
order by cnt desc
结果:
| test1 | test2 | test3 | cnt |
| ----- | ----- | ----- | --- |
| Blood | CAT | MRI | 2 |
| CAT | MRI | RAPID | 1 |
| Blood | MRI | RAPID | 1 |
| Eliza | MRI | | 1 |
| X-Ray | | | 1 |
| Blood | CAT | RAPID | 1 |
答案 1 :(得分:0)
我认为您想要left join
和group by
:
select p1.tests, p2.tests, p3.tests, count(*)
from patient p1 left join
patient p2
on p1.pat_id = p2.pat_id and p1.diagnosis = p2.diagnosis and
p1.tests < p2.tests left join
patient p3
on p2.pat_id = p3.pat_id and p2.diagnosis = p3.diagnosis and
p2.tests < p3.tests
group by p1.tests, p2.tests, p3.tests
order by count(*) desc;
我不确定diagnosis
是否也应包含在结果集中。当您描述结果时,似乎并非如此,但对我来说是有道理的。