我有一个名为coaches_assistants
的SAS数据集,其结构如下。每TeamID
只有两条记录。
TeamID Team_City CoachCode
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
... ... ....
我想要做的是创建一个带有额外字段AssistantCode
的数据集,并使其看起来像:
TeamID Team_City HeadCode AssistantCode
123 Durham 242 876
124 London 876 922
125 Bath 667 786
126 Dover 544 978
... ... ... ...
如果可能的话,我想在一个DATA步骤中执行此操作(尽管我认识到我可能需要先进行PROC SORT步骤)。我知道如何在python或ruby或任何传统的脚本语言中执行此操作,但我不知道如何在SAS中执行此操作。
最好的方法是什么?
答案 0 :(得分:2)
虽然可以在一个datastep中进行,但我通常会发现PROC TRANSPOSE可以更好地解决这类问题。通过这种方式减少手动编码,为新事物提供更大的灵活性(比如出现一个新值“HeadAssistant”,这会立即起作用)。
data have;
length coachcode $25;
input TeamID Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
;;;;
run;
data have_t;
set have;
id=scan(coachcode,1,'_');
val = scan(coachcode,2,'_');
keep teamId team_city id val;
run;
proc transpose data=have_t out=want(drop=_name_);
by teamID team_city;
id id;
var val;
run;
答案 1 :(得分:1)
以下是两种可能的解决方案(一种使用数据步骤,另一种使用PROC SQL):
data have;
length TeamID $3 Team_City CoachCode $20;
input TeamID $ Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
run;
/* A data step solution */
proc sort data=have;
by TeamID;
run;
data want1(keep=TeamID Team_City HeadCode AssistantCode);
/* Define all variables, retain the new ones */
length TeamID $3 Team_City $20 HeadCode $3 AssistantCode $3;
retain HeadCode AssistantCode;
set have;
by TeamID;
if CoachCode =: 'Head'
then HeadCode = substr(CoachCode,6,3);
else AssistantCode = substr(CoachCode,11,3);
if last.TeamID;
run;
/* An SQL solution */
proc sql noprint;
create table want2 as
select TeamID
, max(Team_City) as Team_City
, max(CASE WHEN CoachCode LIKE 'Head%'
THEN substr(CoachCode,6,3) ELSE ' '
END) LENGTH=3 as HeadCode
, max(CASE WHEN CoachCode LIKE 'Assistant%'
THEN substr(CoachCode,11,3) ELSE ' '
END) LENGTH=3 as AssistantCode
from have
group by TeamID;
quit;
PROC SQL的优点是不需要您提前对数据进行排序。
答案 2 :(得分:0)
这假设您已经按teamID
对数据进行了排序,并且主教练总是来到助理之前。警告:未经测试(我真的需要再次访问SAS。)
data want (drop=nc coachcode);
set have;
length headcode assistantcode $3;
retain headcode;
by teamid;
nc = length(coachcode);
if substr(coachcode, 1, 4) = 'Head' then
headcode = substr(coachcode, nc-2, nc);
else
assistantcode = substr(coachcode, nc-2, nc);
if last.teamid;
run;