PROC表示为所有数字变量输出MIN MAX MEDIAN

时间:2014-05-13 17:32:02

标签: sas

我对问题here的情况类似。但是,我不想在var语句中列出我的300个变量名,因为它们都是唯一的。有没有办法使用proc meansproc summary输出一个数据集中所有数值变量的汇总统计信息?

我试过了:

proc means data=my_data min median max;
    output out=summary_data min=min median=median max=max;
run;

但这仅输出第一个变量的摘要统计信息。我也在ods trace

的帮助下尝试过
proc means data=my_data min median max;
    ods output Summary=summary_data;
run;

它为我提供了所有变量的摘要统计信息,但仍然在一行中:

VName_VAR1 VAR1_Minimum VAR1_Median VAR1_Maximum VName_VAR2 VAR2_Minimum etc...
VAR1       3            3           3            VAR2       3         

我的VAR名称都是唯一的。是否有其他方法可以使用proc meansproc summary输出一个数据集中所有数值变量的摘要统计信息?

更新:

当我删除min=min median=median max=max时:

proc means data=my_data min median max;
    output out=summary_data;
run;

然后代码产生输出:

 Obs  _TYPE_ _FREQ_ _STAT_   VAR_1    VAR_2 ... etc

 1    0      91     N          91.00  91    ... etc
 2    0      91     MIN      2005.00  13         .
 3    0      91     MAX      2014.00  13         .
 4    0      91     MEAN     2009.34  13         .
 5    0      91     STD         3.02   0

然而,它仍然没有给我MEDIAN。

4 个答案:

答案 0 :(得分:6)

在使用proc means之前转换数据时,我得到了所需的输出。

proc sort data=sashelp.cars out=cars; by _character_;run;

proc transpose data=cars out=cars_t;
  var _numeric_;
  by _character_;
run;

proc sort data=cars_t;by _name_;run;

proc means data=cars_t noprint;
  output out=cars_summary(drop = _type_ _freq_) min=min median=median max=max;
  by _name_;
run;

然后代码产生输出:

Obs    _NAME_             min     median         max

 1    Cylinders          3.0        6.0        12.0
 2    EngineSize         1.3        3.0         8.3
 3    Horsepower        73.0      210.0       500.0
 4    Invoice         9875.0    25294.5    173560.0
 5    Length           143.0      187.0       238.0
 6    MPG_City          10.0       19.0        60.0
 7    MPG_Highway       12.0       26.0        66.0
 8    MSRP           10280.0    27635.0    192465.0
 9    Weight          1850.0     3474.5      7190.0
10    Wheelbase         89.0      107.0       144.0

如果原始数据中的每一行都有唯一ID,则此方法有效。

答案 1 :(得分:2)

如果您只是在min / med / max之后,那么以下内容将起作用(这样您就不必为变量命名): -

ods output quantiles = quantiles;
proc univariate data = sashelp.cars;
  var _numeric_;
proc sort;
  by varname;
run;

proc transpose data = quantiles out = quan_tran (drop=_name_ rename=(_100__max = max _50__median = median _0__min = min));
  by varname;
  var estimate;
  id quantile;
  where quantile in: ('100', '50', '0');
run;

如果你想要其他类型的测量 - mean,std等 - proc单变量输出它们在不同的数据集中意味着你有合并表等等 - 它会再次变成痛苦。

对于我来说,来自SAS的输出数据集确实令人费解,对我来说,这是最令人震惊的例子。

答案 2 :(得分:2)

为什么不在均衡声明中使用stackods选项?

ods listing close;
ods output summary=s;
proc means data=mydata stackods min median max;
run;
ods output close;
ods listing;
proc print;
run;

答案 3 :(得分:1)

<强>已更新

这是一个基于宏的解决方案,添加了新的逐步注释。它使用SAS dictionary.columns中的元数据来发现数据集中的所有数字变量。基本上,我采用所有数字变量的MINMEDIANMAX,将结果输出到三个单独的数据集中。然后我连接数据集,使用IN变量来确定每行的来源,从而用适当的统计名称对其进行标记。然后输出是三行和n列。

正如OP在他的回答中所展示的那样,只需使用特殊的_NUMERIC_变量就可以取代获取数值变量的整个宏/元数据。我将保留现有的方法,以防有人有兴趣将其用于其他事情。

此外,OP的答案是一个无宏的解决方案,它使用PROC TRANSPOSE到达与此相同的位置,而不需要任何单独的结果集串联。我敦促所有读者对其进行审核,因为它更像“类似SAS”。

%GLOBAL 
    var_names 
    dsn_temp_min
    dsn_temp_median
    dsn_temp_max
; 
%LET dsn_temp_min = min_summary ;
%LET dsn_temp_median= med_summary;
%LET dsn_temp_max= max_summary;

/* Identify dataset */
%LET lib_name = WORK ;  /* change to your library */
%LET dsn = my_data ;

/* Retrieve numeric variable names from SAS metadata and store in `var_name` */
/* macro variable. Library and dataset name must be upper-case since that is */
/* how they are stored in `dictionary.columns`. */
/* UPDATE: this all can be avoided by just using the _NUMERIC_ special variable */
/* but I am leaving this in here in case anyone is interested in querying */
/* meta-data for other purposes. */

%LET lib_name = %UPCASE (&lib_name);
%LET dsn = %UPCASE (&dsn);

PROC SQL NOPRINT;
    SELECT name
    INTO :var_names SEPARATED BY ' '
    FROM dictionary.columns
    WHERE libname = "&lib_name"
    AND memname = "&dsn"
    AND type ^= "char"
;
QUIT;
RUN;

/* Take the MIN of all numeric variables and store in a separate dataset */
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
    OUTPUT OUT=&dsn_temp_min (DROP = _TYPE_ _FREQ_)
        MIN (&var_names) = 
    ;
RUN;

/* Take the MEDIAN of all numeric variables and store in a separate dataset */    
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
    OUTPUT OUT=&dsn_temp_median (DROP = _TYPE_ _FREQ_)
        MEDIAN (&var_names) = 
    ;
RUN;

/* Take the MAX of all numeric variables and store in a separate dataset */        
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
    OUTPUT OUT=&dsn_temp_max (DROP = _TYPE_ _FREQ_)
        MAX (&var_names) = 
    ;
RUN;


/* Concatenate the three separate datasets into one.  Use IN to figure out */
/* where each row is coming from, and label appropriately */
DATA summary_data;
    LENGTH stat $6 ;

    RETAIN
        stat &var_names
    ;

    SET 
        &dsn_temp_min (IN=s1)
        &dsn_temp_median (IN=s2)
        &dsn_temp_max (IN=s3)
    ;

    IF (s1) THEN DO;
        stat = "MIN" ;
    END;
    ELSE IF (s2) THEN DO;
        stat = "MEDIAN" ;
    END;
    ELSE IF (s3) THEN DO;
        stat = "MAX" ;
    END;

    LABEL stat = "Statistic";
RUN;