我有一个输出表,其中包含来自30个不同表的300多个变量,这些表由UNION连接,用于建模。我创建了一个宏,使用此输出表创建一个包含大量统计信息的报表,例如均值,最小值/最大值等。我试图在报告中添加一列,详细说明变量来自哪个表。我说表是因为一些变量是在不同的表之间共享的。我想避免在报告中多次使用相同的变量,因为统计数据是相同的,无论变量来自哪个表。有没有一种有效的方法呢?
答案 0 :(得分:0)
如果是我,我会遍历每个union
数据集,只需将表名和变量名放入已编译的数据集中。您可能将所有表名都放在宏列表中或键入,因此您只需添加几行代码就可以在每个表上运行proc contents
来编译表和变量名的完整列表。请注意,与您的示例一样,在编译表之后,您可以修改重复的变量名称:
** create different tables **;
data height; set sashelp.class(keep=name height); run;
data weight; set sashelp.class(keep=name weight); run;
data sex; set sashelp.class(keep=name sex); run;
** put your datasets into a list either manually or dynamically **;
/* manually */
%let ds_list=height weight sex;
/* dynamically -- be careful to include only tables in your union */
proc sql noprint;
select MEMNAME
into: ds_list separated by " "
from sashelp.vmember
where libname = "WORK" and memname not in ("SASMACR","FORMATS");
quit;
%put &ds_list.;
** loop over each table to put the table name and variables in a dataset **;
%MACRO get_names(ds_list);
%do i=1 %to %sysfunc(countw(&ds_list.));
%let ds = %scan(&ds_list.,&i.);
proc contents data = &ds. noprint
out=names_&ds.(keep=MEMNAME NAME rename=(MEMNAME=SOURCE_DATASET));
run;
proc append data = names_&ds. base=full force; run;
%end;
%MEND;
%get_names(&ds_list.);
答案 1 :(得分:0)
而不是UNION考虑使用DATA STEP,然后使用INDSNAME
选项。
data want;
set sashelp.class sashelp.cars indsname=source;
source_dataset = source;
run;
答案 2 :(得分:0)
我设法使用以下方法执行此操作:
使用源表创建表。
PROC SQL;
CREATE TABLE SOURCES AS
SELECT NAME
,MEMNAME
FROM DICTIONARY.COLUMNS
WHERE LIBNAME='LIBNAME'
ORDER BY 1,2;
RUN;
加入我的统计表。
PROC SQL;
CREATE TABLE STATS_NEW AS
SELECT memname AS TABLE_NAME,a.*
FROM STATS a
LEFT JOIN SOURCES b
ON a.name = b.name
GROUP BY a.name
ORDER BY a.name;
QUIT;
转置数据并添加逗号分隔符。
DATA STATS_TRANSPOSE (drop=TABLE_NAME);
LENGTH INPUT_TABLES $1000;
SET STATS_NEW;
BY name;
RETAIN INPUT_TABLES;
IF FIRST.name THEN DO; INPUT_TABLES=TABLE_NAME; END;
IF NOT FIRST.name
THEN DO;
INPUT_TABLES=CATS(INPUT_TABLES,', ',TABLE_NAME);
END;
IF LAST.name THEN DO;
IF name IN ('FIELD1','FIELD2')
THEN DO; INPUT_TABLES='ALL'; END;
OUTPUT;
END;
RUN;