我有一个由以下内容组成的数据集:
ID,CATEGORY,DATE_TIME
我想删除每个ID / CATEGORY的行,这些行在任何其他记录的5分钟内都有DATE_TIME。例如,我想采取:
AAA, CAT1, 2014-12-09 18:30:58
AAA, CAT1, 2014-12-09 18:15:58
AAA, CAT1, 2014-12-09 18:12:58
AAA, CAT1, 2014-12-09 18:11:58
AAA, CAT2, 2014-12-09 18:11:58
得到这样的东西:
AAA, CAT1, 2014-12-09 18:30:58
AAA, CAT1, 2014-12-09 18:11:58
AAA, CAT2, 2014-12-09 18:11:58
感谢任何帮助!
答案 0 :(得分:1)
加载数据,(我在5分钟后添加了一个事件,在另一个事件后添加了一秒);
data allEvents;
infile datalines dsd dlm=',' ;
informat ID $3. CATEGORY $4. DATE_TIME YMDDTTM20.;
format DATE_TIME DATETIME19.2;
input ID $ CATEGORY $ DATE_TIME ;
datalines;
AAA, CAT1, 2014-12-09 18:30:58
AAA, CAT1, 2014-12-09 18:16:59
AAA, CAT1, 2014-12-09 18:15:58
AAA, CAT1, 2014-12-09 18:12:58
AAA, CAT1, 2014-12-09 18:11:58
AAA, CAT2, 2014-12-09 18:11:58
;
run;
在ID,CATEGORY和DATE_TIME ;
上对其进行排序proc sort data=allEvents;
by ID CATEGORY DATE_TIME;
run;
在数据步骤中读取并过滤;
data wantedEvents (drop=writtenStamp);
set allEvents;
by ID CATEGORY DATE_TIME;
** remember the last written DATE_TIME **;
retain writtenStamp;
if first.CATEGORY then do;
output;
writtenStamp = DATE_TIME;
end;
else if DATE_TIME GT writtenStamp + hms(0,5,0) then do;
output;
writtenStamp = DATE_TIME;
end;
run;
按原始顺序对其进行排序;
proc sort data=wantedEvents;
by ID CATEGORY descending DATE_TIME ;
run;