区分ID从开始&结束日期时间

时间:2014-11-30 06:34:23

标签: oracle sas proc-sql

第一个表包含Key-value&时间如下

Table:Time_Stamp

第二个表包含每个ID的开始和结束日期。

Table:Time_Table

我想从time_stamp找出每行的ID。

Expected Result

有固定数量的类别。但是有很多ID。

您能帮我解决一下如何编写SQL查询吗? (任何SQL样式都没问题。我可以转换它.SAS兼容的PROC SQL会更好)

2 个答案:

答案 0 :(得分:1)

如果您在SAS中执行此操作,则最好使用格式。格式具有获取开始/结束范围的优势,并且非常快 - 如果我没记错的话,大概是o(1)时间。这并不需要对较大的数据集进行排序(如果这是一个问题,甚至可以避免对较小的数据集进行排序),大多数SQL解决方案可能会这样做,除非它们可以将较小的数据集保存在内存中(作为哈希表)

前两个数据步骤只是在上面创建数据,format_two数据步骤是第一个执行任何新操作的步骤。

如果有更多类别,只要它们是alpha而不是数字,这仍然可以正常工作;您想要改变的唯一区别是if _n_ le 2应该有2个相等的(类别总数)。

data time_Stamp;   *Making up the test dataset;
  category='A';
  do value=1 to 6;
    time = intnx('HOUR','01NOV2014:00:00:00'dt,value-1);
    output;
  end;
  category='B';
  do value = 7 to 12;
    time = intnx('HOUR','01NOV2014:00:00:00'dt,value-4);
    output;
  end;
run;

data time_table;    *Making up the ID dataset;
  informat start_time end_time datetime18.;
  input id category $ start_time end_time;
  datalines;
  1 A 01NOV2014:00:00:00 01NOV2014:03:00:00
  1 B 01NOV2014:00:03:00 01NOV2014:06:00:00
  2 A 01NOV2014:03:00:00 01NOV2014:06:00:00
  2 B 01NOV2014:06:00:00 01NOV2014:09:00:00
  ;
quit;


*This restructures time_table into the needed structure for a format lookup dataset;
data format_two;
  set time_table;
  fmtname=cats('KEYFMT',category);   *This is how we handle A/B - different formats.  If it were numeric would need to end with 'F'.;
  start=start_time;
  end=end_time;
  label=id;
  eexcl='Y';         *This makes it exclusive of the end value, so 03:00 goes with the latter ID and not the former.;
  hlo=' ';
  output;
  if _n_ le 2 then do;  *This allows it to return missing if the ID is not found. ;
                        *le 2 is because we want one for each category - if more categories, needs to be hifgher;
    hlo='o';
    label=' ';
    call missing(of start end);
    output;
  end;
run;


*Have to sort to group formats together, but at least this is the small dataset;
*If even this is a time concern, this could be done differently (make 2 different datasets above);
proc sort data=format_two;
  by fmtname;
run;

*Import the format lookups;
proc format cntlin=format_two;
quit;

*Apply using PUTN which allows specifying a format at runtime;
data table_one_ids;
  set time_stamp;
  id = putn(time,cats('KEYFMT',category));
run;

答案 1 :(得分:0)

SELECT        Time_stamp.Category, Time_stamp.Time, Time_stamp.Value, Time_Table.ID
FROM            Time_stamp INNER JOIN
                         Time_Table 
ON Time_stamp.Category = Time_Table.Category 
  AND Time_stamp.Time BETWEEN Time_Table.Start_time AND DATEADD(SS,-1,Time_Table.End_time)
ORDER BY Time_stamp.Category,TIME