我希望能够在数据集中的变量列表上循环PROC SQL
,并且在SQL代码中,我想使用WHERE语句中列表中的变量来按字符对观察值进行子集化值。具体来说,我希望计算数据集中的观察结果,其中列表中的每个变量都编码为"未知"。
设置WHERE MISSING(&VAL)=1
没问题,但是当我尝试引用字符值时,我遇到了问题。
这是我的代码。由于我显然不能大胆地给那些给我带来麻烦的区域,我用< - 问题区域(靠近底部)表示了它。除了提供解决方案之外,我们还会感激任何其他提高我的代码效率的技巧。
%MACRO PERCENTMISSING(LIST);
PROC SQL NOPRINT;
%LET N=%SYSFUNC(COUNTW(&LIST));
%DO I=1 %TO &N;
%LET VAL = %SCAN(&LIST,&I);
CREATE TABLE WORK.SALM_&VAL AS
SELECT DISTINCT "Salmonella" as PATHOGEN,
A.YEAR,
X.Missing&VAL,
Y.Total&VAL,
(X.Missing&VAL/Y.Total&VAL) as PropMiss&VAL,
C.Unknown&VAL,
(C.Unknown&Val/Y.Total&VAL) as PropUnk&VAL
FROM allsalm as A
INNER JOIN (
SELECT YEAR,
COUNT(*) AS Missing&VAL
FROM allsalm
WHERE MISSING(&VAL)=1
GROUP BY Year) X
ON A.Year=X.Year
INNER JOIN (
SELECT YEAR,
COUNT(*) AS Total&VAL
FROM allsalm
GROUP BY Year) Y
ON A.Year=Y.Year
INNER JOIN (
SELECT YEAR,
COUNT(*) AS Unknown&VAL
FROM allsalm
WHERE &VAL IN ("Unknown") <-- PROBLEM AREA
GROUP BY Year) C
ON A.Year=C.Year
;
%END;
QUIT;
%MEND;
我得到的错误信息是:
ERROR: Column UnknownCity could not be found in the table/view identified with the correlation name C.
答案 0 :(得分:0)
自己计算出来,并添加了另一个DO
循环来执行PROC SQL
以获取数据集列表中的变量列表。对于试图计算任意数量的数据集中任意数量的变量的缺失值的比例(和/或“未知”值,如果您的数据集恰好也缺少这种方式的代码),这可能是一个很好的模板。
%MACRO PERCENTMISSING(LIST1,LIST2);
%LET N1=%SYSFUNC(COUNTW(&LIST1));
%LET N2=%SYSFUNC(COUNTW(&LIST2));
%DO I=1 %TO &N1;
%LET VAL1 = %SCAN(&LIST1,&I);
%DO J=1 %TO &N2;
%LET VAL2 = %SCAN(&LIST2,&J);
PROC SQL NOPRINT;
CREATE TABLE &VAL1&VAL2 AS
SELECT DISTINCT "&VAL1" as PATHOGEN,
A.YEAR,
X.Missing&VAL2,
Y.Total&VAL2,
(X.Missing&VAL2/Y.Total&VAL2) as PropMiss&VAL2,
C.Unknown&VAL2,
(C.Unknown&VAL2/Y.Total&VAL2) as PropUnk&VAL2
FROM &VAL1 as A
LEFT JOIN (
SELECT YEAR,
COUNT(*) AS Missing&VAL2
FROM &VAL1
WHERE (MISSING(&VAL2)=1) OR (&VAL2=" ")
GROUP BY Year) X
ON A.Year=X.Year
LEFT JOIN (
SELECT YEAR,
COUNT(*) AS Total&VAL2
FROM &VAL1
GROUP BY Year) Y
ON A.Year=Y.Year
LEFT JOIN (
SELECT YEAR,
COUNT(*) AS Unknown&VAL2
FROM &VAL1
WHERE &VAL2 IN ("U","Unknown")
GROUP BY Year) C
ON A.Year=C.Year;
QUIT;
%END;
%END;
%MEND;
然后只需调用宏,填写LIST1的表名和LIST2的变量名。例如:
%PERCENTMISSING(Table1 Table2 Table3 Table4,Var1 Var2 Var3 Var4 Var5);`