我正在尝试将一些现有值设为缺失值(不删除它们)。 这是我的数据集的基本结构。
当A小于B时,我想将AGE和GENDER视为缺失。例如,当A = 1且B = 3时,我想将最后两行的AGE和GENDER值视为缺失(如图所示)在数据集上。)
在我的数据中,A和B都是从1到4,并且具有它们的每种组合。
Asterisks意味着我们之间有更多数据。提前谢谢!
BEFORE
ID A B AGE GENDER
--------------
1 1 1 35 M
* * * * *
* * * * *
5 1 2 23 F
5 1 2 21 M
6 1 2 42 F
6 1 2 43 M
* * * * *
* * * * *
20 1 3 43 F
20 1 3 39 M
20 1 3 23 M
21 1 3 32 F
21 1 3 39 M
21 1 3 23 F
* * * * *
* * * * *
55 2 4 32 M
55 2 4 12 M
55 2 4 31 F
55 2 4 43 M
* * * * *
* * * * *
AFTER
ID A B AGE GENDER
--------------
1 1 1 35 M
* * * * *
* * * * *
5 1 2 23 F
5 1 2 . .
6 1 2 42 F
6 1 2 . .
* * * * *
* * * * *
20 1 3 43 F
20 1 3 . .
20 1 3 . .
21 1 3 32 F
21 1 3 . .
21 1 3 . .
* * * * *
* * * * *
55 2 4 32 M
55 2 4 12 M
55 2 4 . .
55 2 4 . .
* * * * *
* * * * *
答案 0 :(得分:5)
现在怎么样?
data temp;
retain idcount 0;
set olddata;
** Create an observation counter for each id **;
prev_id = lag(id);
if id ^= prev_id then idcount = 0;
idcount = idcount + 1;
run;
** Sort the obs by ID in reverse order **;
proc sort data=temp;
by id descending idcount;
run;
data temp2;
retain misscount 0;
set temp;
by id descending idcount;
** Keep the previous age and gender **;
old_age = age;
old_gender = gender;
** Count the number that should be missing **;
if a < b then nummiss = b - a;
else nummiss = 0;
** Set a counter of obs that we will set to missing **;
if first.id then misscount = 0;
** Set the appropriate number of rows to missing and update the counter **;
if misscount < nummiss then do;
misscount = misscount + 1;
call missing(age, gender);
end;
run;
proc sort data=temp2 out=temp3(drop=misscount nummiss idcount prev_id);
by id idcount;
run;