我想在sas中使用proc sql
来识别案例或记录是否缺少某些信息。我有两个数据集。一个是整个数据收集的记录,显示访问期间收集的表格。第二个是在访问期间收集应该的形式的规范。我尝试了许多选项,包括使用not in
的数据步骤和sql代码,但无济于事......
示例数据如下
***** dataset crf is a listing of all forms that have been filled out at each visit ;
***** cid is an identifier for a study center ;
***** pid is an identifier for a participant ;
data crf;
input visit cid pid form ;
cards;
1 10 101 10
1 10 101 11
1 10 101 12
1 10 102 10
1 10 102 11
2 10 101 11
2 10 101 13
2 10 102 11
2 10 102 12
2 10 102 13
;
run;
***** dataset crfrule is a listing of all forms that should be filled out at each visit ;
***** so, visit 1 needs to have forms 10, 11, and 12 filled out ;
***** likewise, visit 2 needs to have forms 11 - 14 filled out ;
data crfrule;
input visit form ;
cards;
1 10
1 11
1 12
2 11
2 12
2 13
2 14
;
run;
***** We can see from the two tables that participant 101 has a complete set of records for visit 1 ;
***** However, participant 102 is missing form 12 for visit 1 ;
***** For visit 2, 101 is missing forms 12 and 14, whereas 102 is missing form 14 ;
***** I want to be able to know which forms were **NOT** filled out by each person at each visit (i.e., which forms are missing for each visit) ;
***** extracting unique cases from crf ;
proc sql;
create table visit_rec as
select distinct cid, pid, visit
from crf;
quit;
***** building the list of expected forms by visit number ;
proc sql;
create table expected as
select x.*,
y.*
from visit_rec as x right join crfrule as y
on x.visit = y.visit
order by visit, cid, pid, form;
quit;
***** so now I have a list of which forms that **SHOULD** have been filled out by each person ;
***** now, I just need to know if they were filled out or not... ;
我一直在尝试的策略是将expected
合并回crf
表,并指出每次访问时缺少哪些表单。
最理想的情况是,我想生成一个表格,包括:visit,cid,pid,missing_form
非常感谢任何指导。
答案 0 :(得分:2)
除了你想做的事情。我不一定知道这是一般来说最有效的解决方案(如果你在SAS中这样做,几乎肯定不是这样),但考虑到你到目前为止所做的,这确实有效:
create table want as
select cid,pid,visit,form from expected
except select cid,pid,visit,form from crf
;
请注意EXCEPT - 它非常挑剔(注意select *不起作用,因为你的表的顺序不同)。
答案 1 :(得分:2)
我建议使用嵌套查询,也可以分两步完成。那个怎么样:
proc sql;
create table temp as
select distinct c.*
, (d.visit is null and d.form is null and d.pid is null) as missing_form
from (
select distinct a.pid, b.* from
crf a, crfrule b
) c
left join crf d
on c.pid = d.pid
and c.form = d.form
and c.visit = d.visit
order by c.pid, c.visit, c.form
;
quit;
它为您提供了一个列表,其中包含pid,form,visit的所有可能(即预期)组合以及指示它是否存在的布尔值。
答案 2 :(得分:0)
您可以使用左连接并使用where子句过滤掉右表中缺少记录的记录。
select
e.*
from
expected e left join
crf c on
e.visit = c.visit and
e.cid = c.cid and
e.pid = c.pid and
e.form = c.form
where c.visit is missing
;