SAS:从分类变量创建虚拟变量

时间:2016-07-12 23:36:27

标签: sas

我想转换以下长数据集:

data test;
input Id Injury $;
datalines;
1         Ankle
1         Shoulder 
2         Ankle
2         Head
3         Head
3         Shoulder
;
run;

进入如下所示的宽数据集:

ID  Ankle Shoulder Head
1   1     1        0
2   1     0        1
3   0     1        1'

这个答案似乎最相关,但在proc freq阶段摔倒(我的真实数据集大约有100万条记录,并且有大约30种伤害类型): Creating dummy variables from multiple strings in the same row

其他帮助:https://communities.sas.com/t5/SAS-Statistical-Procedures/Possible-to-create-dummy-variables-with-proc-transpose/td-p/235140

感谢您的帮助!

2 个答案:

答案 0 :(得分:2)

这是一个应该可以轻松工作的基本方法,即使有数百万条记录也是如此。 首先对数据进行排序,然后添加计数以创建1变量。接下来,使用PROC TRANSPOSE将数据从long翻转为wide。然后用0填写缺失的值。这是一个完全动态的方法,它与你有多少种不同的伤害类型或每人有多少记录无关。还有其他方法可能是更短的代码,但我认为这很简单,易于理解和修改,如果需要。

data test;
input Id Injury $;
datalines;
1         Ankle
1         Shoulder 
2         Ankle
2         Head
3         Head
3         Shoulder
;
run;

proc sort data=test;
by id injury;
run;

data test2;
set test;
count=1;
run;

proc transpose data=test2 out=want prefix=Injury_;
by id;
var count;
id injury;
idlabel injury;
run;

data want;
set want;
array inj(*) injury_:;

do i=1 to dim(inj);
    if inj(i)=. then inj(i) = 0;
end;

drop _name_ i;
run;

答案 1 :(得分:1)

这是一个只涉及两个步骤的解决方案......只需确保您的数据首先按id排序(伤害列不需要排序)。

首先,创建一个包含伤害列表的宏变量

proc sql noprint;  
  select distinct injury  
    into :injuries separated by " "  
    from have  
    order by injury;  
quit;  

然后,让RETAIN做出魔法 - 不需要换位!

data want(drop=i injury);
  set have;
  by id;

  format &injuries 1.;
  retain &injuries;
  array injuries(*) &injuries;

  if first.id then do i = 1 to dim(injuries);
    injuries(i) = 0;
  end;

  do i = 1 to dim(injuries); 
    if injury = scan("&injuries",i) then injuries(i) = 1;
  end;

  if last.id then output;
run;

修改

在评论中关注OP的问题,这里是我们如何使用代码和标签进行伤害。可以使用label语句直接在最后一个数据步骤中完成,但为了最小化硬编码,我假设标签已输入到sas数据集中。

1 - 定义标签:

data myLabels;
  infile datalines dlm="|" truncover;
  informat injury $12. labl $24.;
  input injury labl;
  datalines;
S460|Acute meniscal tear, medial
S520|Head trauma
;

2 - 向现有proc sql步骤添加新查询以准备标签分配。

proc sql noprint;  

  /* Existing query */
  select distinct injury  
    into :injuries separated by " "  
    from have  
    order by injury;

  /* New query */
  select catx("=",injury,quote(trim(labl)))
    into :labls separated by " "
    from myLabels;
quit;

3 - 然后,在data want步骤结束时,只需添加label语句。

data want(drop=i injury);
  set have;
  by id;

  /* ...same as before... */

  * Add labels;
  label &labls;
run;

那应该这样做!