我对问题here的情况类似。但是,我不想在var
语句中列出我的300个变量名,因为它们都是唯一的。有没有办法使用proc means
或proc summary
输出一个数据集中所有数值变量的汇总统计信息?
我试过了:
proc means data=my_data min median max;
output out=summary_data min=min median=median max=max;
run;
但这仅输出第一个变量的摘要统计信息。我也在ods trace
:
proc means data=my_data min median max;
ods output Summary=summary_data;
run;
它为我提供了所有变量的摘要统计信息,但仍然在一行中:
VName_VAR1 VAR1_Minimum VAR1_Median VAR1_Maximum VName_VAR2 VAR2_Minimum etc...
VAR1 3 3 3 VAR2 3
我的VAR名称都是唯一的。是否有其他方法可以使用proc means
或proc summary
输出一个数据集中所有数值变量的摘要统计信息?
更新:
当我删除min=min median=median max=max
时:
proc means data=my_data min median max;
output out=summary_data;
run;
然后代码产生输出:
Obs _TYPE_ _FREQ_ _STAT_ VAR_1 VAR_2 ... etc
1 0 91 N 91.00 91 ... etc
2 0 91 MIN 2005.00 13 .
3 0 91 MAX 2014.00 13 .
4 0 91 MEAN 2009.34 13 .
5 0 91 STD 3.02 0
然而,它仍然没有给我MEDIAN。
答案 0 :(得分:6)
在使用proc means
之前转换数据时,我得到了所需的输出。
proc sort data=sashelp.cars out=cars; by _character_;run;
proc transpose data=cars out=cars_t;
var _numeric_;
by _character_;
run;
proc sort data=cars_t;by _name_;run;
proc means data=cars_t noprint;
output out=cars_summary(drop = _type_ _freq_) min=min median=median max=max;
by _name_;
run;
然后代码产生输出:
Obs _NAME_ min median max
1 Cylinders 3.0 6.0 12.0
2 EngineSize 1.3 3.0 8.3
3 Horsepower 73.0 210.0 500.0
4 Invoice 9875.0 25294.5 173560.0
5 Length 143.0 187.0 238.0
6 MPG_City 10.0 19.0 60.0
7 MPG_Highway 12.0 26.0 66.0
8 MSRP 10280.0 27635.0 192465.0
9 Weight 1850.0 3474.5 7190.0
10 Wheelbase 89.0 107.0 144.0
如果原始数据中的每一行都有唯一ID,则此方法有效。
答案 1 :(得分:2)
如果您只是在min / med / max之后,那么以下内容将起作用(这样您就不必为变量命名): -
ods output quantiles = quantiles;
proc univariate data = sashelp.cars;
var _numeric_;
proc sort;
by varname;
run;
proc transpose data = quantiles out = quan_tran (drop=_name_ rename=(_100__max = max _50__median = median _0__min = min));
by varname;
var estimate;
id quantile;
where quantile in: ('100', '50', '0');
run;
如果你想要其他类型的测量 - mean,std等 - proc单变量输出它们在不同的数据集中意味着你有合并表等等 - 它会再次变成痛苦。
对于我来说,来自SAS的输出数据集确实令人费解,对我来说,这是最令人震惊的例子。
答案 2 :(得分:2)
为什么不在均衡声明中使用stackods选项?
ods listing close;
ods output summary=s;
proc means data=mydata stackods min median max;
run;
ods output close;
ods listing;
proc print;
run;
答案 3 :(得分:1)
<强>已更新强>
这是一个基于宏的解决方案,添加了新的逐步注释。它使用SAS dictionary.columns
中的元数据来发现数据集中的所有数字变量。基本上,我采用所有数字变量的MIN
,MEDIAN
和MAX
,将结果输出到三个单独的数据集中。然后我连接数据集,使用IN
变量来确定每行的来源,从而用适当的统计名称对其进行标记。然后输出是三行和n
列。
正如OP在他的回答中所展示的那样,只需使用特殊的_NUMERIC_
变量就可以取代获取数值变量的整个宏/元数据。我将保留现有的方法,以防有人有兴趣将其用于其他事情。
此外,OP的答案是一个无宏的解决方案,它使用PROC TRANSPOSE
到达与此相同的位置,而不需要任何单独的结果集串联。我敦促所有读者对其进行审核,因为它更像“类似SAS”。
%GLOBAL
var_names
dsn_temp_min
dsn_temp_median
dsn_temp_max
;
%LET dsn_temp_min = min_summary ;
%LET dsn_temp_median= med_summary;
%LET dsn_temp_max= max_summary;
/* Identify dataset */
%LET lib_name = WORK ; /* change to your library */
%LET dsn = my_data ;
/* Retrieve numeric variable names from SAS metadata and store in `var_name` */
/* macro variable. Library and dataset name must be upper-case since that is */
/* how they are stored in `dictionary.columns`. */
/* UPDATE: this all can be avoided by just using the _NUMERIC_ special variable */
/* but I am leaving this in here in case anyone is interested in querying */
/* meta-data for other purposes. */
%LET lib_name = %UPCASE (&lib_name);
%LET dsn = %UPCASE (&dsn);
PROC SQL NOPRINT;
SELECT name
INTO :var_names SEPARATED BY ' '
FROM dictionary.columns
WHERE libname = "&lib_name"
AND memname = "&dsn"
AND type ^= "char"
;
QUIT;
RUN;
/* Take the MIN of all numeric variables and store in a separate dataset */
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
OUTPUT OUT=&dsn_temp_min (DROP = _TYPE_ _FREQ_)
MIN (&var_names) =
;
RUN;
/* Take the MEDIAN of all numeric variables and store in a separate dataset */
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
OUTPUT OUT=&dsn_temp_median (DROP = _TYPE_ _FREQ_)
MEDIAN (&var_names) =
;
RUN;
/* Take the MAX of all numeric variables and store in a separate dataset */
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
OUTPUT OUT=&dsn_temp_max (DROP = _TYPE_ _FREQ_)
MAX (&var_names) =
;
RUN;
/* Concatenate the three separate datasets into one. Use IN to figure out */
/* where each row is coming from, and label appropriately */
DATA summary_data;
LENGTH stat $6 ;
RETAIN
stat &var_names
;
SET
&dsn_temp_min (IN=s1)
&dsn_temp_median (IN=s2)
&dsn_temp_max (IN=s3)
;
IF (s1) THEN DO;
stat = "MIN" ;
END;
ELSE IF (s2) THEN DO;
stat = "MEDIAN" ;
END;
ELSE IF (s3) THEN DO;
stat = "MAX" ;
END;
LABEL stat = "Statistic";
RUN;