SAS proc比较不相等的值

时间:2019-04-08 22:36:03

标签: sas compare

我想自动化一些过程。 我使用SAS proc比较,可以获取不等值的变量列表。 此外,我想提取这些变量(具有不相等的值),并使用proc平均值或proc单变量比较它们的均值/中位数/最小/最大值等。 我的问题是如何将proc比较输出保存为表格并从中提取变量?谢谢。

2 个答案:

答案 0 :(得分:1)

OUTSTATS=选项将输出变量的基本统计信息。统计信息为N, MEAN, STD, MIN, MAX, STDERR, T, PROBT, NDIF, DIFMEANS, and R,RSQ

如果您需要计算统计信息以外的其他统计信息,则可以进一步处理OUTSTATS表以创建具有某些差异(按NDIF)的变量列表。

示例:

data have1 have2;
  do row = 1 to 100;
    array x(100);

    do _n_ = 1 to dim (x);
      x(_n_) = _n_ * 1000 + floor(50*ranuni(123)) - 25;
    end;
    output have1;

    * every 5th row in every 5th column have2 could be different;
    if mod(row,5) = 0 then
      do _n_ = 1 to dim (x);
        if mod(_n_,5) = 0 and ranuni(123) < _n_ / 100 then x(_n_) + _n_;
      end;
    output have2;
  end;
run;

proc compare noprint 
  base=have1 
  compare=have2 
  out=differences
  outstats=summary_stats
  outnoequal
  ;
run;
* review summary_stats;

* need more stats than in summary_stats ?
* get list of variables have some differences;
proc sql;
  reset noprint;
  select _var_
  into :vars_that_differed separated by ' '
  from summary_stats
  where _TYPE_ = 'NDIF' and (_BASE_ ne 0 or _COMP_ ne 0)
  ;
quit;

* show the variables that would used in VAR statement of subsequent MEANS or UNIVARIATE;
%put NOTE: &=vars_that_differed;
----- LOG -----
NOTE: VARS_THAT_DIFFERED=x5 x10 x20 x25 x30 x35 x40 x45 x50 x55 x60 x65 x70
x75 x80 x85 x90 x95 x100

答案 1 :(得分:0)

如果您的问题是将Proc compare的输出保存在另一个数据集/表中,则可以使用out选项:

proc compare base=old compare=new
out=Out_ds outnoequal outbase outcomp outdif noprint;
id code;
run;

out_ds将保留结果。

请参阅以下内容,仅在输出数据集中保留不同的变量名称:

data old;
input code A B C D;
datalines;
101 1 a 1 100
102 2 b 1 104
103 3 c 1 54
104 4 d 1 87
105 5 e 1 201
;
run;

data new;
input code A B C D;
datalines;
101 1 a 1 100
102 2 b 1 13
103 3 c 1 54
104 4 d 2 87
105 5 e 1 12
;
run;

proc sort data=old; by code; run;
proc sort data=new; by code; run;

/*suppresses the writing of an observation to the output data set when all values in the observation are judged equal. 
  In addition, in observations containing values for some variables judged equal and others judged unequal, 
  the OUTNOEQUAL option uses the special missing value ".E" to represent differences and percent differences for variables judged equal.*/
proc compare base=old compare=new
out=Out_ds outnoequal;
id code;
run;

/*Get all the variable names from output dataset which you are comparing*/
proc sql ;
  select strip(name)
    into :vnames
    separated by " "
    from dictionary.columns
   where libname="WORK" and
         upcase(memname)="OUT_DS" and
         upcase(strip(name)) not in('_TYPE_','_OBS_','CODE')
  ;
quit;

/*This macro will loop through every value and will keep only those variables in keep_vars, which have unequal values*/
options merror mprint nosymbolgen mlogic;
%macro keepv(varlist);
data out_ds1;
length keep_vars $100.;
     set out_ds;
     retain keep_vars;
     %let var_c=%sysfunc(countw(&varlist));
     %do i=1 %to &var_c;
     %let var_t=%sysfunc(scan(&varlist,&i));
        if strip(&var_t) ne 'E' and findc(keep_vars,"&var_t") ne 1 then 
        do;
            keep_vars=catx(",",keep_vars,"&var_t");
        end;
     %end;
    run;
%mend keepv;

%keepv(&vnames);

/*keep the last obs -  keep_vars have the required variable names only*/
data out_ds_final;
if 0 then set out_ds1 nobs=nobs end=eof;
set out_ds1 point=nobs;
output;
stop;
keep keep_vars;
run;

proc print data=out_ds_final; run;