在X分钟内删除行

时间:2015-08-04 18:49:17

标签: sas

我有一个由以下内容组成的数据集:

ID,CATEGORY,DATE_TIME

我想删除每个ID / CATEGORY的行,这些行在任何其他记录的5分钟内都有DATE_TIME。例如,我想采取:

AAA, CAT1, 2014-12-09 18:30:58
AAA, CAT1, 2014-12-09 18:15:58
AAA, CAT1, 2014-12-09 18:12:58
AAA, CAT1, 2014-12-09 18:11:58
AAA, CAT2, 2014-12-09 18:11:58

得到这样的东西:

AAA, CAT1, 2014-12-09 18:30:58
AAA, CAT1, 2014-12-09 18:11:58
AAA, CAT2, 2014-12-09 18:11:58

感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

加载数据,(我在5分钟后添加了一个事件,在另一个事件后添加了一秒);

data allEvents;
    infile datalines dsd dlm=',' ;
    informat ID $3. CATEGORY $4. DATE_TIME YMDDTTM20.;
    format DATE_TIME DATETIME19.2;
    input ID $ CATEGORY $ DATE_TIME ;
    datalines;
AAA, CAT1, 2014-12-09 18:30:58
AAA, CAT1, 2014-12-09 18:16:59
AAA, CAT1, 2014-12-09 18:15:58
AAA, CAT1, 2014-12-09 18:12:58
AAA, CAT1, 2014-12-09 18:11:58
AAA, CAT2, 2014-12-09 18:11:58
;
run;

在ID,CATEGORY和DATE_TIME ;

上对其进行排序
proc sort data=allEvents;
    by ID CATEGORY DATE_TIME;
run;

在数据步骤中读取并过滤;

data wantedEvents (drop=writtenStamp);
    set allEvents;
    by ID CATEGORY DATE_TIME;

    ** remember the last written DATE_TIME **;
    retain writtenStamp;

    if first.CATEGORY then do;
        output;
        writtenStamp = DATE_TIME;
    end;
    else if DATE_TIME GT writtenStamp + hms(0,5,0) then do;
        output;
        writtenStamp = DATE_TIME;
    end;
run;

按原始顺序对其进行排序;

proc sort data=wantedEvents;
    by ID CATEGORY descending DATE_TIME ;
run;