我想删除整个组,其中没有一个观察到NUM = 14
所以喜欢这个: 原始数据
ID NUM
1 14
1 12
1 10
2 13
2 11
2 10
3 14
3 10
由于ID = 2都不包含NUM = 14,我删除了组2。 它应该是这样的:
ID NUM
1 14
1 12
1 10
3 14
3 10
这是我到目前为止所做的,但似乎没有用。
data originaldat;
set newdat;
by ID;
If first.ID then do;
IF NUM EQ 14 then Score = 100;
Else Score = 10;
end;
else SCORE+1;
run;
data newdat;
set newdat;
If score LT 50 then delete;
run;
答案 0 :(得分:3)
使用proc sql
的方法是:
proc sql;
create table newdat as
select *
from originaldat
where ID in (
select ID
from originaldat
where NUM = 14
);
quit;
sub query为包含ID
观察的群组选择NUM = 14
。然后,where
子句将所选数据限制为仅这些组。
等效数据步骤方法是:
/* Get all the groups that contain an observation where N = 14 */
data keepGroups;
set originaldat;
if NUM = 14;
keep ID;
run;
/* Sort both data sets to ensure the data step merge works as expected */
proc sort data = originaldat;
by ID;
run;
/* Make sure there are no duplicates values in the groups to be kept */
proc sort data = keepGroups nodupkey;
by ID;
run;
/*
Merge the original data with the groups to keep and only keep records
where an observation exists in the groups to keep dataset
*/
data newdat;
merge
originaldat
keepGroups (in = k);
by ID;
if k;
run;
在两个数据集中,subsetting if
语句仅用于在满足条件时输出观察值。在第二种情况下,k
是一个值为1
(true)的临时变量,当从keepGroups
和0
(否则)读取值时。
答案 1 :(得分:2)
你可以在这里找到一个DoW循环,但不是很正确。问题(假设DATA / SET名称输错并且程序中实际上没有错误)是第一个数据步骤不会将100附加到每一行 - 仅限于14行。你需要的是一条线'每个ID值,保留/不保留决定。
您可以通过执行第一个数据步骤,但RETAIN得分,并且每个ID只输出一行来执行此操作。如果您刚刚修复了数据/设置错误,那么您的代码实际上是可行的,基于14是第一行;但只有当14是第一行时它才有效。
data originaldat;
input ID NUM ;
datalines;
1 14
1 12
1 10
2 13
2 11
2 10
3 14
3 10
;;;;
run;
data has_fourteen;
set originaldat;
by ID;
retain keep;
If first.ID then keep=0;
if num=14 then keep=1;
if last.id then output;
run;
data newdata;
merge originaldat has_fourteen;
by id;
if keep=1;
run;
这可以通过将每个ID的值合并到整个数据集来实现。
双DoW也有效。
data newdata;
keep=0;
do _n_=1 by 1 until (last.id);
set originaldat;
by id;
if num=14 then keep=1;
end;
do _n_=1 by 1 until (last.id);
set originaldat;
by id;
if keep=1 then output;
end;
run;
这是有效的,因为它遍历数据集两次;对于每个ID,它遍历所有记录一次,查找14,如果找到一个然后将keep设置为1.然后它再次读取该ID的所有记录,并保留keep=1
。然后按ID继续下一组记录。
答案 2 :(得分:1)
data in;
input id num;
cards;
1 14
1 12
1 10
2 16
2 13
3 14
3 67
;
/* To find out the list of groups which contains num=14, use below SQL */
proc sql;
select distinct id into :lst separated by ','
from in
where num = 14;
quit;
/* If you want to create a new data set with only groups containing num=14 then use following data step */
data out;
set in;
where id in (&lst.);
run;