SAS:让价值观缺失

时间:2012-04-12 18:43:42

标签: sas

我正在尝试将一些现有值设为缺失值(不删除它们)。 这是我的数据集的基本结构。

当A小于B时,我想将AGE和GENDER视为缺失。例如,当A = 1且B = 3时,我想将最后两行的AGE和GENDER值视为缺失(如图所示)在数据集上。)

在我的数据中,A和B都是从1到4,并且具有它们的每种组合。

Asterisks意味着我们之间有更多数据。提前谢谢!

BEFORE
    ID A B AGE GENDER
    --------------
    1  1 1 35  M
    *  * * *   *
    *  * * *   *
    5  1 2 23  F
    5  1 2 21  M
    6  1 2 42  F
    6  1 2 43  M
    *  * * *   *
    *  * * *   *
    20 1 3 43  F
    20 1 3 39  M
    20 1 3 23  M
    21 1 3 32  F
    21 1 3 39  M
    21 1 3 23  F
    *  * * *   *
    *  * * *   *
    55 2 4 32  M
    55 2 4 12  M
    55 2 4 31  F
    55 2 4 43  M
    *  * * *   *
    *  * * *   *

AFTER    
     ID A B AGE GENDER
     --------------
     1  1 1 35  M
     *  * * *   *
     *  * * *   *
     5  1 2 23  F
     5  1 2 .   .
     6  1 2 42  F
     6  1 2 .   .
     *  * * *   *
     *  * * *   *
     20 1 3 43  F
     20 1 3 .   .
     20 1 3 .   .
     21 1 3 32  F
     21 1 3 .   .
     21 1 3 .   .
     *  * * *   *
     *  * * *   *
     55 2 4 32  M
     55 2 4 12  M
     55 2 4 .   .
     55 2 4 .   . 
     *  * * *   *
     *  * * *   *

1 个答案:

答案 0 :(得分:5)

现在怎么样?

data temp;
  retain idcount 0;
  set olddata;

  ** Create an observation counter for each id **;   
  prev_id = lag(id);

  if id ^= prev_id then idcount = 0;
  idcount = idcount + 1;

run;


** Sort the obs by ID in reverse order **; 
proc sort data=temp; 
    by id descending idcount;
run;

data temp2;
    retain misscount 0;
    set temp;
    by id descending idcount;

    ** Keep the previous age and gender **;
    old_age = age;
    old_gender = gender;

    ** Count the number that should be missing **;
    if a < b then nummiss = b - a;
    else nummiss = 0;

    ** Set a counter of obs that we will set to missing **;   
    if first.id then misscount = 0;

    ** Set the appropriate number of rows to missing and update the counter **;
    if misscount < nummiss then do;
       misscount = misscount + 1;
       call missing(age, gender);
    end;
run;

proc sort data=temp2 out=temp3(drop=misscount nummiss idcount prev_id);
by id idcount;
run;