SAS:根据原始数据集重命名合并中的变量

时间:2018-01-26 21:45:54

标签: sas

我有两个数据集,一个用于男性,一个用于女性,包含相同的变量。我需要按组找到每个变量的性别之间的百分比差异。

数据集看起来像这样,但有更多的变量和组,

| Group | Sex | VarA | VarB |
|-------+-----+------+------|
|     1 | F   |    8 |    5 |
|     2 | F   |    6 |    3 |
|     3 | F   |    7 |    0 |
|-------+-----+------+------|

| Group | Sex | VarA | VarB |
|-------+-----+------+------|
|     1 | M   |    9 |    7 |
|     2 | M   |    8 |    5 |
|     3 | M   |    6 |    3 |
|-------+-----+------+------|

我需要的结果是:

| Group | percent_diffA | percent_diffB |
|-------+---------------+---------------|
|     1 |  -0.117647059 |  -0.333333333 |
|     2 |  -0.285714286 |          -0.5 |
|     3 |   0.153846154 |            -2 |
|-------+---------------+---------------|

我可以通过重命名每个变量来解决这个问题。

data difference;
  merge
    females (rename = (VarA = VarA_F VarB = VarB_F)
    males   (rename = (VarA = VarA_M VarB = VarB_M)
    ;
  by group;

  percent_diffA = (VarA_F - VarA_M) / ( (VarA_F + VarA_M) / 2 );
  percent_diffB = (VarB_F - VarB_M) / ( (VarB_F + VarB_M) / 2 );

  drop sex;
run;

但是,这种方法要求我手动重命名所有内容。使用多个变量,重命名语句变得很麻烦。不幸的是,这个计算被插入到一些旧代码中,因此重命名原始数据集是不切实际的。

我想知道是否有另一种方法可以解决这个问题,而不是那么麻烦。

编辑:我更新了变量名称,因为这似乎引起了人们的困惑。它们最初称为Var1Var2。他们现在是VarAVarB。实际变量名称是描述性的,例如body_weight_ggonadal_somatic_index。变量不是简单地用序列号列出的。

3 个答案:

答案 0 :(得分:1)

对于包含按顺序编号的变量的数据集,有用于重命名整个变量范围的变量列表语法:

此示例创建包含100个变量的样本。

data have1 have2;
  do group = 1 to 100;
    sex = 'M';
    array var(100);
    do _n_ = 1 to dim(var);
      var(_n_) = ceil (25 * ranuni(123));
    end;
    if group ne 42 then output have1;
    sex = 'F';
    do _n_ = 1 to dim(var);
      var(_n_) = ceil (25 * ranuni(123));
    end;
    if group ne 100-42 then output have2;
  end;
run;

rename选项适用于所有100个变量。

data want;
  merge 
    have1(rename=var1-var100=mvar1-mvar100 in=_M)
    have2(rename=var1-var100=fvar1-fvar100 in=_F)
  ;

  by group;

  if _M & _F & first.group & last.group then do;

    array one mvar1-mvar100;
    array two fvar1-fvar100;
    array results result1-result100; 

    do i = 1 to dim(results);
      diff = one(i) - two(i);
      mean = mean (one(i), two(i));
      results(i) = diff / mean * 100;
    end;

  end;

  keep group result:;
run;

答案 1 :(得分:1)

盛林的答案是对SQL的简洁使用。 另一种方法是构造一个宏变量,指定在重命名DSO(数据集选项)中使用的重命名。这可以通过对包含列名的字典表的SQL查询来完成。

* This macro creates the macro variable rename_suffix, to be used in a rename statement or data set option ;
* It will be of form: var1 = var1_suffix var2 = var2_suffix ... ;
* &inset is the input set. &suffix is the suffix to added to all variables except for the variables specified in &keys. ; 
* &keys variables should be given each in quotation marks, and separated by spaces. ;
%macro rename_list(inset, suffix, keys) ;
    %global rename_&inset ; * So that this macro variable is accessable outside the macro ;
    proc sql ;
        select strip(name) || ' = ' || strip(name) || "_&suffix"
            into :rename_&inset separated by ' '
            from sashelp.vcolumn /* dictionary.columns can be used in place of sashelp.vcolumn */
                where libname = 'WORK' & memname = "%sysfunc(upcase(&inset))" 
                      & upcase(strip(name)) not in (' ' %sysfunc(upcase(&keys)));   * The ' ' is included, so there is no error if no keys are given ;
    quit ;
%mend rename_list ; 

%rename_list(females, F, 'GROUP' 'SEX')
%rename_list(males  , M, 'GROUP' 'SEX')
%put &rename_females ; * Check that the macro variables are correct ;
%put &rename_males ;

%macro pct_diff(num) ;
    percent_diff&num = (Var&num._F - Var&num._M) / ( (Var&num._F + Var&num._M) / 2 ) ;
%mend pct_diff ;

data difference ;
    merge females(rename = (&rename_females), drop = sex) 
          males  (rename = (&rename_males  ), drop = sex) ;
    by group ;

    pct_diff(1) ;
    pct_diff(2) ;
run ;
dm 'vt difference';

还可以使用宏缩短percent_diff变量的创建(如图所示)。如果要比较大量和/或可变数量的变量,则可以通过自动检测比较次数来进一步缩短它,通过运行相同的SQL查询并将select into部分修改为

select count(name) into :varct trimmed

计算变量的数量,然后在数据步骤中使用do循环:

    do i = 1 to &varct ;
        %pct_diff(i) ;
    end ;

答案 2 :(得分:0)

在proc sql中使用表别名以避免名称更改:

proc sql;
   select a.group,(a.var1-b.var1)/((a.var1+b.var1)/2) as percent_diff1, 
   (a.var2-b.var2)/((a.var2+b.var2)/2) as percent_diff2
   from female as a,male as b
   where a.group=b.group;
quit;