仅为三种事件类型选择ID

时间:2018-08-31 16:25:51

标签: sql postgresql

我有一个事件表。它包含几个subject_id和hadm_id的几种事件。例如,它在“事件标签” HRy下包含“心律”事件。

我想选择主题ID和hadm_id,它们对于事件标签HRy包含值“ SR”或“ SB”或“ ST”。事件标签HRy具有其他值的主题将被丢弃。这意味着,如果患者的事件标签HRy不是“ SR”,“ SB”或“ ST”,则将其丢弃。如果它们全部为“ SR”或全部为“ SB”或全部为“ ST”,那都很好。如果他们混合使用这3种,那就很好。还有其他类型的事件(例如BP),但没关系。

这是一个预期输出的示例表:

drop table testevents cascade;
create table testevents(
hadm_id int not null,
subject_id int not null,
eventtype int not null,
eventlabel char(30) not null,
value char(360) not null,
valuenum int
);

insert into testevents(hadm_id, subject_id, eventtype, eventlabel, value, valuenum)
values
    (1, 1, 220048, 'HRy', 'SR', null),
    (1, 1, 220048, 'HRy', 'SR', null),
    (1, 1, 220048, 'HRy', 'SR', null),
    (1, 1, 220048, 'HRy', 'SR', null),
    (1, 1, 220048, 'HRy', 'SR', null),
    (1, 1, 220048, 'HRy', 'SR', null),   --all good here: SR all the time

    (2, 2, 220048, 'HRy', 'SR', null),
    (2, 2, 220048, 'HRy', 'SR', null),
    (2, 2, 220048, 'HRy', 'SR', null),
    (2, 2, 220048, 'HRy', 'ST', null),
    (2, 2, 220048, 'HRy', 'SR', null),
    (2, 2, 220048, 'HRy', 'ST', null),  --all good here: either SR or ST, both allowed

    (3, 3, 220048, 'HRy', 'ST', null),
    (3, 3, 220048, 'HRy', 'ST', null),
    (3, 3, 220048, 'HRy', 'ST', null),
    (3, 3, 220048, 'HRy', 'ST', null),
    (3, 3, 220048, 'HRy', 'ST', null),
    (3, 3, 220048, 'HRy', 'ST', null),   --all good here: ST all the time
    (3, 3, 4053, 'BP', '87', 87),        --it contains another type of event, which doesn't matter

    (4, 4, 220048, 'HRy', 'ST', null),
    (4, 4, 220048, 'HRy', 'ST', null),
    (4, 4, 220048, 'HRy', 'AF', null),  --Here we have AF, which is not allowed. 
    (4, 4, 220048, 'HRy', 'ST', null),
    (4, 4, 220048, 'HRy', 'ST', null),
    (4, 4, 220048, 'HRy', 'SR', null),   
    (4, 4, 4053, 'BP', '87', 87),        

    (5, 5, 220048, 'HRy', 'SB', null),
    (5, 5, 220048, 'HRy', 'ST', null),
    (5, 5, 220048, 'HRy', 'SR', null),  --Here we have the 3 different types, all alowed. 
    (5, 5, 220048, 'HRy', 'SB', null),
    (5, 5, 220048, 'HRy', 'SR', null),
    (5, 5, 220048, 'HRy', 'SR', null),   
    (5, 5, 4053, 'BP', '87', 87),        

    (6, 6, 220048, 'HRy', 'SR', null), -- allowed
    (6, 6, 211, 'HRa2', '134', 134), -- doesn't matter
    (6, 6, 211, 'HRa2', '187', 187), -- doesn't matter
    (6, 6, 220048, 'HRy', 'AF', null), -- NOT allowed
    (6, 6, 220048, 'HRy', 'SR', null) -- allowed
;


output:
hadm_id, subject_id
1            1
2            2
3            3
5            5

我该如何实现?

非常感谢!

1 个答案:

答案 0 :(得分:1)

这里是一种方法:

SELECT hadm_id, subject_id
FROM testevents
WHERE eventlabel = 'HRy'
GROUP BY hadm_id, subject_id
HAVING ARRAY_AGG(DISTINCT TRIM(value)) <@ ARRAY['SR', 'SB', 'ST']
ORDER BY hadm_id, subject_id

返回:

enter image description here

获取每个hadm_id和subject_id的所有不同值,并检查它们是否都包含在允许值的数组中。 TRIM btw是因为char(360)类型,用空格填充值。