加入具有最接近时间戳的两个数据集

时间:2016-06-28 20:35:21

标签: sas

我需要使用closet时间戳连接两个表。

data a;
  input id name $5. timea time8.;
  format timea time5.;
  cards;
  1 John 9:17 
  1 John 10:25
  2 Chris 9:17 
  3 Emily 14:25
;run;

data b;
  input id name $5. timea time8.;
  format timeb time5.;
  cards;
  1 John 9:00 
  1 John 10:00
  2 Chris 9:00 
  3 Emily 14:30
;run;

Table Want: 
id name timea timeb 
1  John  9:17 9:30
1  John  10:25 10:00
2  Chris 9:17 9:00
3  Emily 14:25 14:30

我的方法是在表b中构建一个key = id ||名称,按键排序,然后在表b中为每个时间戳创建一个区间。在下面的代码之后,我无法第一次看到John。

data time(rename=prev_TimeB = TimeB);
  length start_time end_time 8;
  retain start_time 0 prev_TimeB;
  set B(keep=TimeB) end = last;
  by key;
  if not first.key then do;
    end_time = TimeB - ((TimeB - prev_TimeB) / 2);
    output;
    prev_timeB = TimeB;
    if last.key then do;
    end_time = '23:59:59.999't;
    output;
  end;
  format prev_timeB start_time end_time time12.3;
  drop TimeB;
run;

感谢您的时间!

2 个答案:

答案 0 :(得分:0)

找出差异是最小绝对差异的记录。更容易在SAS中编码,因为它会自动将聚合函数值与详细记录重新合并。

data a;
  input id name :$5. timea :time8.;
  format timea time5.;
cards;
1 John 9:17
1 John 10:25
2 Chris 9:17
3 Emily 14:25
4 Joe 11:21
;

data b;
  input id name :$5. timeb time8.;
  format timeb time5.;
cards;
1 John 9:00
1 John 10:00
2 Chris 9:00
3 Emily 14:30
;

proc sql ;
  create table C as
   select a.*
        , timeb
        , timea-timeb as seconds
        , abs(calculated seconds) as distance
   from a
   left join b
   on a.id = b.id and a.name = b.name
   group by a.id,a.name,a.timea
   having min(calculated distance) = calculated distance
  ;
quit;

结果

id    name     timea    timeb    seconds    distance
1    John      9:17     9:00      1020       1020
1    John     10:25    10:00      1500       1500
2    Chris     9:17     9:00      1020       1020
3    Emily    14:25    14:30      -300        300
4    Joe      11:21        .         .          .

答案 1 :(得分:-1)

如果您已对数据集A和B进行了排序,则可以将临时变量pos = n 添加到两个表中:

Data a;
  set a;
  pos=_n_;
run;
Data b;
  set b;
  pos=_n_;
run;

您将拥有以下表格: id name timea pos id name timea pos 约翰一书9:17 1约翰福音9:00 1 约翰一书10:25 2约翰福音10:00 1 2 Chris 9:17 3 2克里斯9:00 3 3 Emily 14:25 4 3 Emily 14:30 4

然后你可以在proc sql语句中使用join

proc sql;
  create table result as
  select *
  from a t1
  left join b t2
    on t1.pos=t2.pos;
quit;

如果数据集未排序 - 首先按正确的顺序排序