我有一个日志文件,其中包含一些记录的不同版本。 SAS中最有效的方法是,通过合理大文件的记录计算每个变量(在用户定义的列表中)的修订数?
例如:
%let vars='Var1 Var2 Var4';
Record_ID Var1 Var2 VarThree Var4
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A
我想接受像:
ID Var No
1 Var1 0
1 Var2 0
1 Var4 2
2 Var1 0
2 Var2 1
2 Var4 2
答案 0 :(得分:0)
以下解决方案需要两个步骤来实现您想要的布局, 1.得到变化的计数 2.转置。
data have;
input (id var1-var4) ($);
cards;
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A
;
data _want;
set have(rename=(var1-var4=v1-v4));
by id;
array v v:;
array var var1-var4;
do over v;
var+(v ne lag(v));
if first.id then var=0;
end;
if last.id;
drop v1-v4;
run;
PROC TRANSPOSE DATA=_want
OUT=want(rename=col1=no)
NAME=var
;
BY id;
VAR var1 var2 var3 var4;
RUN; QUIT;
答案 1 :(得分:0)
我使用了第一个变量来计算运行次数,但是我觉得如果VARS是混合类型会导致一系列滞后问题,虽然不是不可克服的,但LAG似乎更容易。我添加了一些代码来处理用户定义的变量列表,这些变量将成为需求的开始。
data log;
input Record_ID (Var1-Var4)(:$1.);
cards;
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A
;;;;
run;
proc print;
run;
%macro main(data=log,id=record_id,vars=var1-var2 var4);
proc transpose data=&data(obs=0) out=vars;
var &vars;
run;
proc sql noprint;
select catx(' ',"set &data(keep=&id",_name_,"); by notsorted &id",_name_,';')
into :stmts separated by ' '
from vars;
quit;
%put NOTE: &=sqlobs %bquote(&=stmts);
data report(keep=&id varname count);
do until(last.&id);
&stmts;
array _f[*] 'first.'n:;
array _n[%eval(&sqlobs+1)] n0-n&sqlobs;
drop n0;
do j = 2 to dim(_f);
_n[j] + _f[j];
end;
end;
length varname $32;
do j=2 to dim(_f);
varname = scan(vname(_f[j]),-1);
count = _n[j]-1;
output;
end;
call missing(of _n[*]);
run;
proc print;
run;
%mend main;
options mprint=1;
%main();
答案 2 :(得分:0)
假设数据不是太大,我首先想到的是转置然后计数方法。
data have;
input (id var1-var4) ($);
rowid=_n_;
cards;
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A
;
414 %let vars=Var1 Var2 Var4;
415
416 proc transpose data=have out=h(keep=id _name_ col1
417 rename=(_name_=Var col1=Value)
418 );
419 var &vars;
420 by rowid id;
421 run;
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.H has 21 observations and 3 variables.
422
423
424 proc sort data=h equals;
425 by id Var;
426 run;
NOTE: There were 21 observations read from the data set WORK.H.
NOTE: The data set WORK.H has 21 observations and 3 variables.
427
428 data want(keep=id Var NumberOfChanges);
429 set h;
430 by id Var Value notsorted;
431 if first.Var then NumberOfChanges=0;
432 else if first.Value then NumberOfChanges++1;
433 if last.Var;
434
435 put (ID Var NumberofChanges)(=);
436 run;
id=1 Var=var1 NumberOfChanges=0
id=1 Var=var2 NumberOfChanges=0
id=1 Var=var4 NumberOfChanges=2
id=2 Var=var1 NumberOfChanges=0
id=2 Var=var2 NumberOfChanges=1
id=2 Var=var4 NumberOfChanges=2
NOTE: There were 21 observations read from the data set WORK.H.
NOTE: The data set WORK.WANT has 6 observations and 3 variables.