我有一个相当复杂的场景。
ID cola colb
1 10 0
1 11 1
2 12 0
2 13 1
2 15 2
3 11 0
4 12 0
5 12 0
5 15 1
6 10 0
现在我想删除所有ID,如果cola eq 12和colb = 0
所以我需要删除所有id = 2的情况,一例id = 4,所有id = 5的情况
基本上,只要满足任何id的cola eq 12和colb eq 0的标准,就需要删除该id的所有实例。
答案 0 :(得分:0)
如果您没有太多观察结果,可以使用一个datastep进行此操作。保持满足条件的ID号的运行标志,将它们放入宏变量并在if语句中调用宏变量:
data want (drop = flag);
length flag $200;
set have end = eof;
retain flag;
if cola = 12 and colb = 0 then flag = cat(strip(put(ID,8.))," ",flag);
if eof then call symputx("keepMe", flag);
if ID not in (&keepMe);
run;
对于包含更多datasteps的任何大小的数字数据也可以完成:
data want;
set have;
if cola = 12 and colb = 0 then flag = 1;
else flag = 0;
run;
proc sort data = want;
by ID descending flag;
run;
data want_final (drop = sum flag);
set want;
by ID;
sum + flag;
if first.id then sum = flag;
if sum = 0;
run;
答案 1 :(得分:0)
我建议你也尝试这种方式,它使用proc sql,对于长数据集来说可能更快。
按照示例
创建数据的步骤data work.input;
length id 3. cola 3. colb 3.;
input id cola colb;
infile datalines dsd;
datalines;
1, 10, 0
1, 11, 1
2, 12, 0
2, 13, 1
2, 15, 2
3, 11, 0
4, 12, 0
5, 12, 0
5, 15, 1
6, 10, 0
;
run;
实际执行您要求的步骤:
proc sql noprint;
select id into :id_list_to_remove separated by ','
from work.input
where cola=12 and colb=0
;
create table output as
select *
from work.input
where id not in (&id_list_to_remove.)
;
quit;