删除其观察中没有包含SAS中特定值的组

时间:2015-03-04 19:52:08

标签: sas

我想删除整个组,其中没有一个观察到NUM = 14

所以喜欢这个: 原始数据

ID  NUM 
1  14
1  12
1  10
2  13
2  11
2  10
3  14
3  10

由于ID = 2都不包含NUM = 14,我删除了组2。 它应该是这样的:

ID  NUM 
1  14
1  12
1  10
3  14
3  10

这是我到目前为止所做的,但似乎没有用。

data originaldat;
set newdat;
by ID;
If first.ID then do;
        IF NUM EQ 14 then Score = 100;
        Else Score = 10;
    end;
else SCORE+1;
run; 

data newdat;
set newdat;
   If score LT 50 then delete;
run;

3 个答案:

答案 0 :(得分:3)

使用proc sql的方法是:

proc sql;
    create table newdat as
    select * 
    from originaldat
    where ID in (
        select ID 
        from originaldat
        where NUM = 14
    );
quit;

sub query为包含ID观察的群组选择NUM = 14。然后,where子句将所选数据限制为仅这些组。


等效数据步骤方法是:

/* Get all the groups that contain an observation where N = 14 */
data keepGroups;
    set originaldat;
    if NUM = 14;
    keep ID;
run;
/* Sort both data sets to ensure the data step merge works as expected */
proc sort data = originaldat;
    by ID;
run;
/* Make sure there are no duplicates values in the groups to be kept */
proc sort data = keepGroups nodupkey;
    by ID;
run;
/* 
    Merge the original data with the groups to keep and only keep records
    where an observation exists in the groups to keep dataset
*/
data newdat;
    merge 
        originaldat 
        keepGroups (in = k);
    by ID;
    if k;
run;

在两个数据集中,subsetting if语句仅用于在满足条件时输出观察值。在第二种情况下,k是一个值为1(true)的临时变量,当从keepGroups0(否则)读取值时。

答案 1 :(得分:2)

你可以在这里找到一个DoW循环,但不是很正确。问题(假设DATA / SET名称输错并且程序中实际上没有错误)是第一个数据步骤不会将100附加到每一行 - 仅限于14行。你需要的是一条线'每个ID值,保留/不保留决定。

您可以通过执行第一个数据步骤,但RETAIN得分,并且每个ID只输出一行来执行此操作。如果您刚刚修复了数据/设置错误,那么您的代码实际上是可行的,基于14是第一行;但只有当14是第一行时它才有效。

data originaldat;   
input ID  NUM ;
datalines;
1  14
1  12
1  10
2  13
2  11
2  10
3  14
3  10
;;;;
run;

data has_fourteen;
set originaldat;
by ID;
retain keep;
If first.ID then keep=0;
if num=14 then keep=1;
if last.id then output;
run; 

data newdata;
  merge originaldat has_fourteen;
  by id;
  if keep=1;
run;

这可以通过将每个ID的值合并到整个数据集来实现。

双DoW也有效。

data newdata;
  keep=0;
  do _n_=1 by 1 until (last.id);
    set originaldat;
    by id;
    if num=14 then keep=1;
  end;
  do _n_=1 by 1 until (last.id);
    set originaldat;
    by id;
    if keep=1 then output;
  end;
run;

这是有效的,因为它遍历数据集两次;对于每个ID,它遍历所有记录一次,查找14,如果找到一个然后将keep设置为1.然后它再次读取该ID的所有记录,并保留keep=1。然后按ID继续下一组记录。

答案 2 :(得分:1)

data in;
input id num;
cards;
1 14
1 12
1 10
2 16
2 13
3 14
3 67
;

/* To find out the list of groups which contains num=14, use below SQL */

proc sql;
  select distinct id into :lst separated by ','
  from in
  where num = 14;
quit;

/* If you want to create a new data set with only groups containing num=14 then use following data step */

data out;
 set in;
 where id in (&lst.);
run;