Question

我有一个相当复杂的场景。

ID  cola   colb
1    10     0
1    11     1
2    12     0
2    13     1
2    15     2
3    11     0
4    12     0
5    12     0
5    15     1
6    10     0

现在我想删除所有ID，如果cola eq 12和colb = 0

所以我需要删除所有id = 2的情况，一例id = 4，所有id = 5的情况

基本上，只要满足任何id的cola eq 12和colb eq 0的标准，就需要删除该id的所有实例。

Answer 1

如果您没有太多观察结果，可以使用一个datastep进行此操作。保持满足条件的ID号的运行标志，将它们放入宏变量并在if语句中调用宏变量：

data want (drop = flag);
  length flag $200;
  set have end = eof;

  retain flag;

  if cola = 12 and colb = 0 then flag = cat(strip(put(ID,8.))," ",flag);

  if eof then call symputx("keepMe", flag);

  if ID not in (&keepMe);
run;

对于包含更多datasteps的任何大小的数字数据也可以完成：

data want;
  set have;

  if cola = 12 and colb = 0 then flag = 1;
  else flag = 0;

run;

proc sort data = want;
  by ID descending flag;
run;

data want_final (drop = sum flag);
  set want;
  by ID;
  sum + flag;
  if first.id then sum = flag;

  if sum = 0;

run;

Answer 2

我建议你也尝试这种方式，它使用proc sql，对于长数据集来说可能更快。

按照示例

创建数据的步骤

data work.input;
    length id 3. cola 3. colb 3.;
    input id cola colb;
    infile datalines dsd;
        datalines;
            1,    10,     0
            1,    11,    1
            2,    12,     0
            2,    13,     1
            2,   15,     2
            3,    11,     0
            4,    12,     0
            5,    12,     0
            5,    15,     1
            6,   10,     0
        ;
run;

实际执行您要求的步骤：

proc sql noprint;

    select id into :id_list_to_remove separated by ','
    from work.input
    where cola=12 and colb=0
    ;

    create table output as
    select * 
    from work.input
    where id not in (&id_list_to_remove.)
    ;
quit;

如果两个条件在sas中相遇则删除

2 个答案: