我想转换以下长数据集:
data test;
input Id Injury $;
datalines;
1 Ankle
1 Shoulder
2 Ankle
2 Head
3 Head
3 Shoulder
;
run;
进入如下所示的宽数据集:
ID Ankle Shoulder Head
1 1 1 0
2 1 0 1
3 0 1 1'
这个答案似乎最相关,但在proc freq阶段摔倒(我的真实数据集大约有100万条记录,并且有大约30种伤害类型): Creating dummy variables from multiple strings in the same row
感谢您的帮助!
答案 0 :(得分:2)
这是一个应该可以轻松工作的基本方法,即使有数百万条记录也是如此。
首先对数据进行排序,然后添加计数以创建1变量。接下来,使用PROC TRANSPOSE
将数据从long翻转为wide。然后用0填写缺失的值。这是一个完全动态的方法,它与你有多少种不同的伤害类型或每人有多少记录无关。还有其他方法可能是更短的代码,但我认为这很简单,易于理解和修改,如果需要。
data test;
input Id Injury $;
datalines;
1 Ankle
1 Shoulder
2 Ankle
2 Head
3 Head
3 Shoulder
;
run;
proc sort data=test;
by id injury;
run;
data test2;
set test;
count=1;
run;
proc transpose data=test2 out=want prefix=Injury_;
by id;
var count;
id injury;
idlabel injury;
run;
data want;
set want;
array inj(*) injury_:;
do i=1 to dim(inj);
if inj(i)=. then inj(i) = 0;
end;
drop _name_ i;
run;
答案 1 :(得分:1)
这是一个只涉及两个步骤的解决方案......只需确保您的数据首先按id排序(伤害列不需要排序)。
首先,创建一个包含伤害列表的宏变量
proc sql noprint;
select distinct injury
into :injuries separated by " "
from have
order by injury;
quit;
然后,让RETAIN
做出魔法 - 不需要换位!
data want(drop=i injury);
set have;
by id;
format &injuries 1.;
retain &injuries;
array injuries(*) &injuries;
if first.id then do i = 1 to dim(injuries);
injuries(i) = 0;
end;
do i = 1 to dim(injuries);
if injury = scan("&injuries",i) then injuries(i) = 1;
end;
if last.id then output;
run;
在评论中关注OP的问题,这里是我们如何使用代码和标签进行伤害。可以使用label
语句直接在最后一个数据步骤中完成,但为了最小化硬编码,我假设标签已输入到sas数据集中。
1 - 定义标签:
data myLabels;
infile datalines dlm="|" truncover;
informat injury $12. labl $24.;
input injury labl;
datalines;
S460|Acute meniscal tear, medial
S520|Head trauma
;
2 - 向现有proc sql
步骤添加新查询以准备标签分配。
proc sql noprint;
/* Existing query */
select distinct injury
into :injuries separated by " "
from have
order by injury;
/* New query */
select catx("=",injury,quote(trim(labl)))
into :labls separated by " "
from myLabels;
quit;
3 - 然后,在data want
步骤结束时,只需添加label
语句。
data want(drop=i injury);
set have;
by id;
/* ...same as before... */
* Add labels;
label &labls;
run;
那应该这样做!