Question

我有一个包含n级id和（n + 4）个变量的数据集。我希望使用n-1个变量的值作为解释变量，对分类变量的n个级别中的每个级别执行回归。这是我的数据集：

data have;
    input s id $ x z y00 y01 y02;
cards;
1 00  95 5.00 .02 .43 .33  
2 00 100 5.50 .01 .44 .75
3 00 110 5.25 .10 .37 .34
4 00  97 5.00 .02 .43 .33  
5 00 100 5.50 .01 .43 .75
6 00 120 5.25 .10 .38 .47
7 00  95 5.00 .02 .43 .35  
8 00 130 5.50 .01 .44 .75
9 00 110 5.25 .10 .39 .44
10 00  85 5.00 .02 .43 .33  
11 00 110 5.50 .01 .47 .78
12 00 110 5.25 .10 .37 .44
1 01 20 6.00 .22 .01 .66
2 01 25 5.95 .43 .10 .20
3 01 70 4.50 .88 .05 .17
1 02 80 2.50 .65 .33 .03
2 02 85 3.25 .55 .47 .04
3 02 90 2.75 .77 .55 .01
;
run;

所以我希望使用z，y01和y02来解释ID为00的x。类似地，z，y00和y02将解释ID为01的x。最后，z，y00和y01将解释ID为ID 02

我可以使用'BY'语句，但我想不出如何告诉模型忽略与我目前正在使用的ID具有相同前缀的变量。

我可以创建单独的数据集，但是对于其中一些分析，n> 100。

理想情况下，如上所述，我会为每个ID运行proc mixed和proc reg，并且每个ID都有一个数据集。

有什么想法吗？

proc mixed data=have(where=(id='00')) plots(only)=all method=REML nobound ;
    class s;
    model x=z y01 y02
    / solution;
    random z y01 y02;
run;


proc reg data=have(where=(id='00'));
    model x=z y01 y02;
run;

感谢。

Answer 1

遗憾的是，如果没有数据操作，我也不知道有什么方法可以做到这一点，但这里有两种可能的方法可供选择。

选项1.将所需的自变量复制到新变量中。

/* Count the number of y variables */
proc sql noprint;
    select max(input(id, best.)) + 1
    into :dimY
    from have;
quit;

data alsoHave;
    set have;
    /* Create an array for indexing the y variables */
    array y[&dimY.] y:;
    /* Create new variables to contain y values */
    array newy[%eval(&dimY.-1)];
    _j = 1;
    do _i = 1 to &dimY.;
        /* Put each y value in a newy variable where it isn't  y_id  */
        if input(id, best.) + 1 ~= _i then do;
            newy[_j] = y[_i];
            _j + 1;
        end;
    end;
    drop _:;
run;

proc reg data = alsoHave;
    by id;
    model x = z newy:;
run;

选项2.将不需要的变量的方差减少到0，这样它们就不会影响回归。

data alsoHave;
    set have;
    /* Create an array for indexing the y variables */
    array y[*] y:;
    _i = input(id, best.) + 1;
    backUp = y[_i];
    /* Overwrite the unwanted variables with 0 */
    y[_i] = 0;
    drop _:;
run;

proc reg data = alsoHave;
    by id;
    model x = z y:;
run;

我更喜欢选项2的简单性，但阵列编程很有趣，所以无论如何我都包含了选项。

编辑：以下是选项2的id不可知版本，不需要连续的整数。有趣的函数是dim()，它返回数组中的变量数，vname()返回数组和索引中的变量名。最后，compress()与k（保留）和d（数字）选项一起使用。可以对选项1进行类似的更改。

data alsoHave;
    set have;
    /* Create an array for indexing the y variables */
    array y[*] y:;
    /* Loop through the y variables */
    do _i = 1 to dim(y);
        /* Replace with 0  when the variable name matches the id */
        if input(compress(vname(y[_i]), , "dk"), best.) = input(id, best.) then y[_i] = 0;
    end;
    drop _:;
run;

SAS运行多个回归并收集结果

1 个答案: