从一组感兴趣的列中选择一个随机列

时间:2016-06-01 19:31:54

标签: sas

我正在尝试提供代码,该代码将从感兴趣的一组列中选择一个随机列。列组将根据每个观察的列中的值而更改。每次观察都是一个主题。

让我更清楚地解释一下:

我有8列,名称为V1-V8。每列有3个潜在的回复(' Small'' Medium'' High')。由于我们项目中的某些情况,我需要"结合"所有这些信息分为1栏。

关键因素1:我们只想要他/她选择的每个主题的列'高' (这里有很多组合)。当我说每个主题的兴趣列发生变化时,这就是我所指的。

关键因素2:一旦我确定了哪些列高'高'为主题选择,随机选择其中一列。

最后,我需要一个新变量(New_V),其值为V1-V8(不是'小','中''高')表示为每个主题选择了哪一列。

任何建议都会很棒。我尝试过ARRAYs和Macro变量,但我似乎能以正确的方式解决这个问题。

2 个答案:

答案 0 :(得分:1)

您使用数组进入了正确的轨道。 vname功能在这里会很有用。 want datastep显示了如何执行此操作(其余只是设置示例数据):

proc format;
  value smh
    1='Small'
    2='Medium'
    3='High'
    other=' '
  ;
quit;
data have;
  call streaminit(5);
  array v[8] $;
  do _i = 1 to 1000;
    do _j = 1 to 8;
      __rand = ceil(1+rand('Binomial',.7,2));
      v[_j] = put(__rand,smh6.);
    end;
    if whichc('High',of v[*]) = 0 then v8 = 'High';  *guarantee have one high;
    output;
  end;
  drop _:;
run;

data want;
  call streaminit(7);  *arbitrary seed here, pick any positive number;
  set have;
  array v[8] ;
  do until (v[_rand] = 'High');  *repeat this loop until one is picked that is High;
    _rand = ceil(8*rand('Uniform'));  
  end;
  chosen_v = vname(v[_rand]);  *assign the chosen name to chosen_v variable;
  drop _:; 
run;

proc freq data=want;
  tables chosen_v;
run;

答案 1 :(得分:1)

此方法使用宏变量和循环。主要有三个步骤:首先,找到所有“高”的变量。其次,选择从1到“高”变量数的随机值。第三,选择该变量并将其命名为selected_var。

data temp;
   input subject $ v1 $ v2 $ v3 $ v4 $ v5 $ v6 $ v7 $ v8 $;
   datalines;
    1 high medium small high medium small high medium
    2 medium small high medium small high medium high
    3 small high high medium small high medium high
    4 medium medium high medium small small medium medium
    5 medium medium high small small high medium small
    6 small small high medium small high high high
    7 small small small small small small small small
    8 high high high high high high high high
    ;
run;

%let vars = v1 v2 v3 v4 v5 v6 v7 v8;

%macro find_vars;

    data temp2;
        set temp;

            /*find possible variables*/
            format possible_vars $20.;
            %do i = 1 %to %sysfunc(countw(&vars.));
            %let this_var = %scan(&vars., &i.);
                if &this_var. = "high" then possible_vars = cats(possible_vars, "&this_var.");
            %end;

            /*create a random integer between 1 and number of variables to select from*/
            rand = 1 + floor((length(possible_vars) / 2) * rand("Uniform"));

            /*pick that one!*/
            selected_var = substr(possible_vars, (rand * 2 - 1), 2);
    run;

%mend find_vars;

%find_vars;

enter image description here