SAS proc sql取消已经合并的观察资格

时间:2015-05-22 21:32:09

标签: sql sas

我正在使用5个标准集合并两个数据集,条件是如果在条件集编号 n 下创建匹配,那么这些观察结果将被取消在任何其他标准集下的匹配资格x> Ñ。例如,如果合并在数据集1:观察10和数据集2:观察15之间设置的第一个标准下成功,那么这两个观察结果不符合在任何后续标准(第二,第三,第四,第五)下合并的条件。

到目前为止,我的方法是将标志变量添加到合并创建的表中,然后将该表合并回两个父数据集,然后对于下一个标准集,我要求缺少标志变量。但是,我有新的大数据集,并且由于“资源不足”错误,此方法失败。这很简单,但很抱歉。提前感谢您阅读。

前两个标准的当前代码示例:

* Initialize parent datasets
data work.parentdata1;
set lib.parentdata1;
run;

data work.parentdata2;
set lib.parentdata2;
run;

***************;
*Criteria set 1;
***************;
proc sql;
create table match_1 as
select *
from parentdata1 o, parentdata2 t
    where o.variable_A = t.variable_a
    and o.variable_B= t.variable_b
;
quit;

* Results dataset (to be used for later analysis);
data work.match_1;
    set match_1;
    match_quality = 1;
run;

* Dataset for merge with parent dataset 1;
data work.mergematched_1;
    set match_1;
    match_dummy = 1;
run;

* sort matched table by parent dataset 1 id to prepare for parent merge;
proc sort data = work.mergematched_1;
    by id1;
run;

* merge matched observations back to parent dataset 1 to disqualify from      future criteria sets;
data work.parentdata1_a;
    merge work.parentdata1 work.mergematched_1;
    by id1;
run;

*sort matched table by parent dataset 2 id to prepare for parent merge;
proc sort data = work.mergematched_1;
    by id2;
run;

*merge matched observations back to parent dataset 2 to disqualify from   future criteria sets;
data work.parentdata2_a;
    merge work.parentdata2 work.mergematched_2;
    by id2;
run;
***************;
*Criteria set 2;
***************;
proc sql;
create table match_2 as
select *
from parentdata1 o, parentdata2 t
where o.match_dummy = . and t.match_dummy = .
and o.variable_X = t.variable_x
and o.variable_Y= t.variable_y
;
quit;

* Results dataset (to be used for later analysis);
data work.match_2;
set match_2;
match_quality = 2;
run;

* Dataset for merge with parent dataset 1a;
data work.mergematched_2;
set match_2;
match_dummy = 1;
run;

* sort matched table by parent dataset 1a id to prepare for parent merge;
proc sort data = work.mergematched_2;
by id1;
run;

* merge matched observations back to parent dataset 1a to disqualify from      future criteria sets;
data work.parentdata1_b;
merge work.parentdata1_a work.mergematched_2;
by id1;
run;

*sort matched table by parent dataset 2a id to prepare for parent merge;
proc sort data = work.mergematched_2;
by id2;
run;

*merge matched observations back to parent dataset 2a to disqualify from future criteria sets;
data work.parentdata2_b;
merge work.parentdata2_a work.mergematched_2;
by id2;
run;

1 个答案:

答案 0 :(得分:0)

由于您在合并中仅使用3或4个值,因此请删除所有其他变量并仅保留用于合并的变量,然后在最后的所有其他结果中合并。

我也不知道你是否需要match_1和merged_match1,因为除了添加另一个变量之外它似乎不会改变文件。如果空间不足,请尽量避免创建临时数据集。

我认为您还可以将以下几个步骤合并为一个步骤,这是标准1下的前四个过程/数据步骤。

proc sql;
create table match_1 as
select *, 1 as match_quality, 1 as match_dummy
from parentdata1 o
inner join parentdata2 t
on o.variable_A = t.variable_a
and o.variable_B= t.variable_b
order by id1
;
quit;