我在SAS EG中有一张表如下:
DateTime User Prod Date Date2 User2 Prod2 DateTime2 SecDiff Calc
20MAR2014:20:17:00 54823 1430 20140320 . . . . . 1
13MAR2014:09:07:16 66019 8244 20140313 20140320 54823 1430 20Mar14:20:17:00 -644984 1
13MAR2014:09:07:44 66019 8244 20140313 20140313 66019 8244 13Mar14:09:07:16 28 0
13MAR2014:09:08:17 66019 8244 20140313 20140313 66019 8244 13Mar14:09:07:44 33 0
13MAR2014:09:08:43 66019 8244 20140313 20140313 66019 8244 13Mar14:09:08:17 26 0
13MAR2014:09:09:12 66019 8244 20140313 20140313 66019 8244 13Mar14:09:08:43 29 0
13MAR2014:09:10:34 66019 8244 20140313 20140313 66019 8244 13Mar14:09:09:12 82 0
13MAR2014:09:11:08 66019 8244 20140313 20140313 66019 8244 13Mar14:09:10:34 34 0
13MAR2014:09:11:34 66019 8244 20140313 20140313 66019 8244 13Mar14:09:11:08 26 0
14MAR2014:21:19:18 66019 8244 20140314 20140313 66019 8244 13Mar14:09:11:34 130064 1
14MAR2014:21:19:52 66019 8244 20140314 20140314 66019 8244 14Mar14:21:19:18 34 0
所有带有2的列都是"非-2"列。
此数据说明了某个DateTime标记上特定产品的用户活动。
我有兴趣创建" Sessions"基于一些参数,但在最后一列中说明了" Calc"。因此,对于每个新的会话,应该有一个新的号码。参数是" Date = Date2"," User = User2"," Prod = Prod2"和" SecDiff< = 3600"。 最终目标是计算每个会话的秒数(通过添加SecDiff)。 所以在这个例子中,表格会给我这些结果:
DateTime User Prod Date Seconds Calc
20MAR2014:20:17:00 54823 1430 20140320 . 1
13MAR2014:09:07:16 66019 8244 20140313 258 2
14MAR2014:21:19:18 66019 8244 20140314 34 3
以秒为单位返回不同的值,但会删除计算出的极值并表示新的会话。
答案 0 :(得分:0)
我不确定我是否理解这一点,但为什么你需要所有这些列。只需使用前三个
data a;
input act_time datetime20. product user;
cards;
20MAR2014:20:17:00 54823 1430
13MAR2014:09:07:16 66019 8244
13MAR2014:09:07:44 66019 8244
13MAR2014:09:08:17 66019 8244
13MAR2014:09:08:43 66019 8244
13MAR2014:09:09:12 66019 8244
13MAR2014:09:10:34 66019 8244
13MAR2014:09:11:08 66019 8244
13MAR2014:09:11:34 66019 8244
14MAR2014:21:19:18 66019 8244
14MAR2014:21:19:52 66019 8244
;
run;
proc sort data=a out=b; by product user act_time; run;
data c(drop=last_time);
retain last_time;
set b;
by product user;
if first.user then do;
session + 1;
last_time = .;
end;
else do;
if act_time >= last_time + 3600 then do;
session + 1;
end;
end;
last_time = act_time;
run;
data d;
retain first_time;
set c;
by session;
if first.session then do;
first_time = act_time;
end;
if last.session then do;
duration = act_time - first_time;
output;
end;
run;