在按时间安排的事件中,我很难提取工人执行的独特任务。唯一的组合由ID和模式定义。以下数据集模仿该场景:
ID Time Mode Event
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
希望获得以下结果:
ID Time Mode Event
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
87889 20121009 C Closed
87900 20120411 A Closed
我将首先按时间降序排列:
proc sort data=df; by ID descending time; run;
然后我可以再次使用sort来通过ID和模式获得唯一的组合:
proc sort data=df dupout=nodup nodupkey;
by ID Mode; run;
在最后一步中,如何确保无重复记录也是最新事件?
谢谢!
答案 0 :(得分:1)
您可以先使用。最后一个概念
data have;
input ID Time:yymmdd8. Mode $ Event $;
format time yymmdd10.;
datalines;
23456 20120101 A Open
23456 20120101 B Closed
87690 20120311 G Closed
98000 20120201 B Open
98000 20120301 A Open
98000 20120101 A Open
87889 20121009 C Closed
87889 20120101 C Open
87900 20120411 A Closed
87900 20120102 A Closed
;
proc sort data = have out=have1;
by id mode time;
run;
data want;
set have1;
by id mode time;
if last.mode and last.time then output;
run;
或者我可以如下所示简单proc sql
proc sql;
create table want1 as
select id, time, mode, event from have
group by id, mode
having time = max(time);
要使代码正常工作,在第一类中,您需要成为第一类 proc sort data = df;按ID模式下降时间;运行;