如何在WHERE语句中使用宏变量来按字符串对数据进行子集化? (SAS 9.3)

时间:2014-08-13 14:31:25

标签: sas

我希望能够在数据集中的变量列表上循环PROC SQL,并且在SQL代码中,我想使用WHERE语句中列表中的变量来按字符对观察值进行子集化值。具体来说,我希望计算数据集中的观察结果,其中列表中的每个变量都编码为"未知"。

设置WHERE MISSING(&VAL)=1没问题,但是当我尝试引用字符值时,我遇到了问题。

这是我的代码。由于我显然不能大胆地给那些给我带来麻烦的区域,我用< - 问题区域(靠近底部)表示了它。除了提供解决方案之外,我们还会感激任何其他提高我的代码效率的技巧。

    %MACRO PERCENTMISSING(LIST);
    PROC SQL NOPRINT;
       %LET N=%SYSFUNC(COUNTW(&LIST));
       %DO I=1 %TO &N;
       %LET VAL = %SCAN(&LIST,&I);
    CREATE TABLE WORK.SALM_&VAL AS
        SELECT DISTINCT "Salmonella" as PATHOGEN,
                            A.YEAR,
                            X.Missing&VAL,
                            Y.Total&VAL,
                            (X.Missing&VAL/Y.Total&VAL) as PropMiss&VAL,
                            C.Unknown&VAL,
                            (C.Unknown&Val/Y.Total&VAL) as PropUnk&VAL
        FROM allsalm as A
        INNER JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Missing&VAL
                    FROM allsalm
                    WHERE MISSING(&VAL)=1
                    GROUP BY Year) X
        ON A.Year=X.Year
        INNER JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Total&VAL
                    FROM allsalm
                    GROUP BY Year) Y
        ON A.Year=Y.Year
        INNER JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Unknown&VAL
                    FROM allsalm
                    WHERE &VAL IN ("Unknown") <-- PROBLEM AREA
                    GROUP BY Year) C
        ON A.Year=C.Year
        ;
    %END;
    QUIT;
    %MEND;

我得到的错误信息是:

ERROR: Column UnknownCity could not be found in the table/view identified with the correlation name C.

1 个答案:

答案 0 :(得分:0)

自己计算出来,并添加了另一个DO循环来执行PROC SQL以获取数据集列表中的变量列表。对于试图计算任意数量的数据集中任意数量的变量的缺失值的比例(和/或“未知”值,如果您的数据集恰好也缺少这种方式的代码),这可能是一个很好的模板。

   %MACRO PERCENTMISSING(LIST1,LIST2);
   %LET N1=%SYSFUNC(COUNTW(&LIST1));
   %LET N2=%SYSFUNC(COUNTW(&LIST2));
   %DO I=1 %TO &N1;
      %LET VAL1 = %SCAN(&LIST1,&I);
         %DO J=1 %TO &N2;
            %LET VAL2 = %SCAN(&LIST2,&J);

    PROC SQL NOPRINT;
    CREATE TABLE &VAL1&VAL2 AS
        SELECT DISTINCT "&VAL1" as PATHOGEN,
                            A.YEAR,
                            X.Missing&VAL2,
                            Y.Total&VAL2,
                            (X.Missing&VAL2/Y.Total&VAL2) as PropMiss&VAL2,
                            C.Unknown&VAL2,
                            (C.Unknown&VAL2/Y.Total&VAL2) as PropUnk&VAL2
        FROM &VAL1 as A
        LEFT JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Missing&VAL2
                    FROM &VAL1
                    WHERE (MISSING(&VAL2)=1) OR (&VAL2=" ")
                    GROUP BY Year) X
        ON A.Year=X.Year
        LEFT JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Total&VAL2
                    FROM &VAL1
                    GROUP BY Year) Y
        ON A.Year=Y.Year
        LEFT JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Unknown&VAL2
                    FROM &VAL1
                    WHERE &VAL2 IN ("U","Unknown")
                    GROUP BY Year) C
        ON A.Year=C.Year;
    QUIT;
  %END;
%END;
%MEND;

然后只需调用宏,填写LIST1的表名和LIST2的变量名。例如:

%PERCENTMISSING(Table1 Table2 Table3 Table4,Var1 Var2 Var3 Var4 Var5);`