SAS - 计算修改数量

时间:2016-09-06 13:40:57

标签: sas

我有一个日志文件,其中包含一些记录的不同版本。 SAS中最有效的方法是,通过合理大文件的记录计算每个变量(在用户定义的列表中)的修订数?

例如:

%let vars='Var1 Var2 Var4';

Record_ID Var1 Var2 VarThree Var4   
1 A A A A  
1 A A A B  
1 A A A A  
2 A A A A  
2 A B B A  
2 A B C B  
2 A B B A  

我想接受像:

ID Var No  
1 Var1 0  
1 Var2 0  
1 Var4 2  
2 Var1 0  
2 Var2 1  
2 Var4 2  

3 个答案:

答案 0 :(得分:0)

以下解决方案需要两个步骤来实现您想要的布局, 1.得到变化的计数 2.转置。

data have;
input (id var1-var4) ($);
cards;
1 A A A A
 1 A A A B
 1 A A A A
 2 A A A A
 2 A B B A
 2 A B C B
 2 A B B A 
 ;


 data _want;
 set have(rename=(var1-var4=v1-v4));
 by id;
 array v v:;
 array var var1-var4;
 do over v;
 var+(v ne lag(v));
 if first.id then var=0;
 end;
 if last.id;
 drop v1-v4;
 run;

 PROC TRANSPOSE DATA=_want
    OUT=want(rename=col1=no)
    NAME=var
;
    BY id;
    VAR var1 var2 var3 var4;
RUN; QUIT;

答案 1 :(得分:0)

我使用了第一个变量来计算运行次数,但是我觉得如果VARS是混合类型会导致一系列滞后问题,虽然不是不可克服的,但LAG似乎更容易。我添加了一些代码来处理用户定义的变量列表,这些变量将成为需求的开始。

data log;
   input Record_ID (Var1-Var4)(:$1.);
   cards;
 1 A A A A
 1 A A A B
 1 A A A A
 2 A A A A
 2 A B B A
 2 A B C B
 2 A B B A 
 ;;;;
   run;
proc print;
   run;
%macro main(data=log,id=record_id,vars=var1-var2 var4);
   proc transpose data=&data(obs=0) out=vars;
      var &vars;
      run;
   proc sql noprint;
      select catx(' ',"set &data(keep=&id",_name_,"); by notsorted &id",_name_,';') 
         into :stmts separated by ' '
      from vars;
      quit;
   %put NOTE: &=sqlobs %bquote(&=stmts);

   data report(keep=&id varname count);
      do until(last.&id);
         &stmts;
         array _f[*] 'first.'n:;
         array _n[%eval(&sqlobs+1)] n0-n&sqlobs;
         drop n0;
         do j = 2 to dim(_f);
            _n[j] + _f[j];
            end;
         end;
      length varname $32;      
      do j=2 to dim(_f);
         varname = scan(vname(_f[j]),-1);
         count   = _n[j]-1;
         output;
         end;
      call missing(of _n[*]);
      run;
   proc print;
      run;
   %mend main;
options mprint=1;
%main();

答案 2 :(得分:0)

假设数据不是太大,我首先想到的是转置然后计数方法。

data have;
  input (id var1-var4) ($);
  rowid=_n_;
cards;
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A 
;

414  %let vars=Var1 Var2 Var4;
415
416  proc transpose data=have out=h(keep=id _name_ col1
417                                rename=(_name_=Var col1=Value)
418                                );
419   var &vars;
420   by rowid id;
421  run;

NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.H has 21 observations and 3 variables.

422
423
424  proc sort data=h equals;
425    by id Var;
426  run;

NOTE: There were 21 observations read from the data set WORK.H.
NOTE: The data set WORK.H has 21 observations and 3 variables.

427
428  data want(keep=id Var NumberOfChanges);
429    set h;
430    by id Var Value notsorted;
431    if first.Var then NumberOfChanges=0;
432    else if first.Value then NumberOfChanges++1;
433    if last.Var;
434
435    put (ID Var NumberofChanges)(=);
436  run;

id=1 Var=var1 NumberOfChanges=0
id=1 Var=var2 NumberOfChanges=0
id=1 Var=var4 NumberOfChanges=2
id=2 Var=var1 NumberOfChanges=0
id=2 Var=var2 NumberOfChanges=1
id=2 Var=var4 NumberOfChanges=2
NOTE: There were 21 observations read from the data set WORK.H.
NOTE: The data set WORK.WANT has 6 observations and 3 variables.