第一个表包含Key-value&时间如下
第二个表包含每个ID的开始和结束日期。
我想从time_stamp找出每行的ID。
有固定数量的类别。但是有很多ID。
您能帮我解决一下如何编写SQL查询吗? (任何SQL样式都没问题。我可以转换它.SAS兼容的PROC SQL会更好)
答案 0 :(得分:1)
如果您在SAS中执行此操作,则最好使用格式。格式具有获取开始/结束范围的优势,并且非常快 - 如果我没记错的话,大概是o(1)时间。这并不需要对较大的数据集进行排序(如果这是一个问题,甚至可以避免对较小的数据集进行排序),大多数SQL解决方案可能会这样做,除非它们可以将较小的数据集保存在内存中(作为哈希表)
前两个数据步骤只是在上面创建数据,format_two
数据步骤是第一个执行任何新操作的步骤。
如果有更多类别,只要它们是alpha而不是数字,这仍然可以正常工作;您想要改变的唯一区别是if _n_ le 2
应该有2个相等的(类别总数)。
data time_Stamp; *Making up the test dataset;
category='A';
do value=1 to 6;
time = intnx('HOUR','01NOV2014:00:00:00'dt,value-1);
output;
end;
category='B';
do value = 7 to 12;
time = intnx('HOUR','01NOV2014:00:00:00'dt,value-4);
output;
end;
run;
data time_table; *Making up the ID dataset;
informat start_time end_time datetime18.;
input id category $ start_time end_time;
datalines;
1 A 01NOV2014:00:00:00 01NOV2014:03:00:00
1 B 01NOV2014:00:03:00 01NOV2014:06:00:00
2 A 01NOV2014:03:00:00 01NOV2014:06:00:00
2 B 01NOV2014:06:00:00 01NOV2014:09:00:00
;
quit;
*This restructures time_table into the needed structure for a format lookup dataset;
data format_two;
set time_table;
fmtname=cats('KEYFMT',category); *This is how we handle A/B - different formats. If it were numeric would need to end with 'F'.;
start=start_time;
end=end_time;
label=id;
eexcl='Y'; *This makes it exclusive of the end value, so 03:00 goes with the latter ID and not the former.;
hlo=' ';
output;
if _n_ le 2 then do; *This allows it to return missing if the ID is not found. ;
*le 2 is because we want one for each category - if more categories, needs to be hifgher;
hlo='o';
label=' ';
call missing(of start end);
output;
end;
run;
*Have to sort to group formats together, but at least this is the small dataset;
*If even this is a time concern, this could be done differently (make 2 different datasets above);
proc sort data=format_two;
by fmtname;
run;
*Import the format lookups;
proc format cntlin=format_two;
quit;
*Apply using PUTN which allows specifying a format at runtime;
data table_one_ids;
set time_stamp;
id = putn(time,cats('KEYFMT',category));
run;
答案 1 :(得分:0)
SELECT Time_stamp.Category, Time_stamp.Time, Time_stamp.Value, Time_Table.ID
FROM Time_stamp INNER JOIN
Time_Table
ON Time_stamp.Category = Time_Table.Category
AND Time_stamp.Time BETWEEN Time_Table.Start_time AND DATEADD(SS,-1,Time_Table.End_time)
ORDER BY Time_stamp.Category,TIME