我的数据如下:
ID FileSource Age MamUlt ProcDate Name
223 Facility 35 M 19591 SWEDISH
223 Facility 35 M 19592 SWEDISH
223 Facility 35 U 19592 SWEDISH
223 Facility 35 U 19593 SWEDISH
223 Non-Facility 35 M 19594 RADIA
223 Non-Facility 35 U 19594 RADIA
我想要做的是将这些数据(对于数据集中的每个ID)组合成如下所示:
ID Age MAMs ULTs SameDate
223 35 3 3 2
因此,对于每个ID,我需要总时间" M"和" U"出现以及他们出现在同一天的次数;这个样本中有两次。
这是我到目前为止所做的:
data ImageTotals;
set ImageClaims;
by ID;
retain ID MAMs ULTs SameDate;
if first.ID then do;
MAMs = 0;
ULTs = 0;
MamDate = .;
UltDate = .;
SameDate = 0;
end;
if MamUlt = "M" then do; MAMs = MAMs + 1; MamDate = ProcDate; end;
if MamUlt = "U" then do; ULTs = ULTs + 1; UltDate = ProcDate; end;
if MamDate = UltDate and MamDate ^= . then do; SameDate = SameDate+1; end;
if last.ID;
keep ID MAMs ULTs SameDate;
run;
有什么建议吗?这解决了计数问题,但没有解决SameDate问题(此实例仍然为零)。
答案 0 :(得分:2)
您可以使用DOW循环在数据步骤中进行聚合。数据必须按ID和PROCDATE排序。在同一天内计算M或U出现的次数。然后,您可以使用这些日期计数在ID级别进行聚合,并测试两者是否出现在同一日期。只保留AGE变量,使其具有该ID的最后一条记录的值。
data counts ;
do until (last.id);
m=0;
u=0;
do until (last.procdate);
set imageclaims;
by id procdate;
m= sum(m,proc='M');
u= sum(u,proc='U');
end;
MAMs=sum(mams,m);
ULTs=sum(ults,u);
SameDate=sum(samedate,m and u);
end;
keep id age mams ults samedate ;
run;
答案 1 :(得分:1)
我认为这可能是一个SQL问题(不是我的专长),但是自从你开始使用DATA步骤解决方案后,我对两者都进行了尝试。我还添加了更多的测试数据。
data ImageClaims;
input id age Proc $1. ProcDate;
cards;
223 35 M 19591
223 35 M 19592
223 35 U 19592
223 35 U 19593
223 35 M 19594
223 35 U 19594
224 35 M 19591
224 35 M 19592
224 35 M 19593
224 35 M 19593
224 35 M 19594
224 35 U 19595
225 35 M 19592
225 35 U 19592
225 35 U 19593
225 35 M 19593
225 35 M 19594
225 35 U 19594
;
run;
对于DATA步骤方法,为MAM,ULT和MAMULT(同一天的Mam和Ult)创建计数器。注意,因为我对这些计数器(MAM ++ 1)使用sum语句,它们被隐式保留。
data ImageTotals (keep=id Age MAMs ULTs MAMULTs);
set ImageClaims;
by ID ProcDate;
retain HaveMam HaveUlt; *Count vars are implicitly retained by sum statement;
if first.ID then do;
MAMs=0; *count of mammograms;
ULTs=0; *count of ultrasounds;
MAMULTs=0; *count of mammograms and ultrasounds on same date;
end;
if first.ProcDate then do;
HaveMam=0; *indicator for have a mammogram or not on that date;
HaveUlt=0; *indicator for have an ultrasound or not on that date;
end;
if Proc='M' then do;
HaveMam=1; *set mammogram indicator (for that date);
MAMs++1; *increment counter;
end;
else if Proc='U' then do;
HaveUlt=1; *set ultrasound indicator (for that date);
ULTs++1; *increment counter;
end;
if last.ProcDate then do;
MAMULTs++(HaveMam=1 and HaveUlt=1); *increment MamUlts counter if had both on same date;
end;
if last.id;
run;
对于SQL解决方案,我使用通过ID和ProcDate计算MAM,ULT和MAMULT的子查询,然后外部查询按ID对它们求和。可能有一个更好的SQL解决方案,但我认为这是有效的。
proc sql;
create table ImageTotals as
select id
,max(age) as age /*arbitrary use of max age is constant within id*/
,sum(MAMs) as MAMs
,sum(ULTs) as ULTs
,sum(MAMULTs) as MAMULTs
from (
select id
,procdate
,max(age) as age
,sum(Proc='M') as MAMs
,sum(Proc='U') as ULTs
,count(distinct(Proc))=2 as MAMULTs
from ImageClaims
group by id,ProcDate
)
group by id
;
quit;
proc print;
run;
我从两个步骤得到的Work.ImageTotals是:
Obs id age MAMs ULTs MAMULTs
1 223 35 3 3 2
2 224 35 5 1 0
3 225 35 3 3 3
答案 2 :(得分:0)
一旦你接受了Q的建议,认为这可以通过proc sql(count / group by)来解决,除非我在这里误解了复杂性......会发布一些代码,但是会让你先解决它...