Question

我在SAS中有两个数据集。它们都包含相同的变量x。在第一个数据集中，我想删除那些x值也在第二个数据集的x值中的观察值。

实施例，

data set1;
    input x y z;
    datalines;
    1 1.5 2.2
    1 2.1 9.0
    2 4.2 4.4
    3 4.5 2.4
    ;
run;

data set2;
    input x y;
    datalines;
    1 15
    2 44
    ;
run;

在第1组中，如果x=1或x=2，其中1和2来自第二个数据集的x值，我想删除这些观察结果。我只想保留第1组中的最后一行。

Answer 1

所以你的最终答案应该只包括3？有几种方法，但我觉得这是最清楚的理解方法。

proc sql;
create table want as
select * 
from set1
where x not in (select x from set2);
quit;

Answer 2

数据步骤版本：

data want;
  merge set1(in = _1) 
        set2(in = _2 keep = x);
  by x;
  if _1 and not(_2);
run;

这假设set1和set2都已按x排序或在x上有索引。

使用第二个数据集中的变量选择观察值

2 个答案: