如何获取sas中日期列的第二个最大和第三个最大记录

时间:2019-01-29 07:46:32

标签: sql sas sas-macro

我想获取下表的最大日期,第二个最大日期和第三个最大日期

proc SQL;

create table Kri_obs_l(
  Kri_rk int,
  value_dt datetime,
  score int
);

insert into Kri_obs_l
values (10000, "2Jan2018"d, 12)
values (10001, "2Jan2018"d, 2)
values (10001, "29Jan2019"d, 18)
values (10002, "2Jan2018"d, 5)
values (10002, "26Jan2019"d, 18)
values (10002, "27Jan2019"d, 18)
values (10003, "2Jan2018"d, 18)
values (10003, "26Jan2019"d, 18)
values (10003, "27Jan2019"d, 18)
values (10004, "2Jan2018"d, 25)
values (10005, "2Jan2018"d, 32)
values (10005, "26Jan2019"d, 18)
values (10005, "27Jan2019"d, 18)
values (10006, "2Jan2018"d, 5)
values (10007, "2Jan2018"d, 2)
values (10008, "2Jan2018"d, 12)
values (10009, "2Jan2018"d, 18)
values (10009, "26Jan2019"d, 18)
values (10009, "27Jan2019"d, 18)
;
quit;

我的代码以获取最大值,第二最大值和第三最大值

proc sql;
create table KRI_score_max as
select Kri_rk, MAX(value_dt) as reported_dt format=date9.
from KRI_OBS_l
group by Kri_rk;

create table KRI_score_second_max as
select Kri_rk, MAX(value_dt) as reported_dt format=date9.
from KRI_OBS_l a
where value_dt = (select MAX(value_dt) from KRI_OBS_l b where value_dt< (select MAX(value_dt) from KRI_OBS_l))
group by Kri_rk;

create table KRI_score_third_max as
select Kri_rk, MAX(value_dt) as reported_dt format=date9.
from KRI_OBS_l
where value_dt < (select MAX(value_dt) from KRI_OBS_l where value_dt< (select MAX(value_dt) from KRI_OBS_l))
group by Kri_rk;

create table KRI_score_third_max as
select * 
from KRI_score_third_max
where KRI_rk in (select Kri_rk from KRI_score_second_max);

quit;

获取最大记录工作正常,我在获取第二个最大记录和第三个最大记录时遇到问题 如果我从表中删除29Jan2019记录,它工作正常,但是添加该记录会使27Jan2019成为第二个最大记录,因此查询会将此记录作为第二个最大记录。 我想通过Kri_rk获取每个记录组的第二个最大记录和第三个最大记录。

由于proc SQL中没有任何限制函数,因此我无法按desc和限制组合使用命令。

5 个答案:

答案 0 :(得分:0)

如果您希望第1至第3个最大日期可以使用row_number()窗口功能

select * from 
(
select * ,row_number() over(partition by Kri_rk order by value_dt desc) rn
from table_name
) t where t.rn<=3

答案 1 :(得分:0)

您可以在第二条语句中从第一个创建的表中排除结果,然后从第三条查询中排除前两个表。

proc sql;
create table KRI_score_max as
select Kri_rk, MAX(value_dt)  first_max as reported_dt format=date9.
from KRI_OBS_l
group by Kri_rk;

create table KRI_score_second_max as
select Kri_rk, MAX(value_dt) second_max as reported_dt  format=date9.
from KRI_OBS_l a
where (kri_rk, value_dt) not in (select kri_rk, first_max from KRI_score_max)
group by Kri_rk
UNION ALL 
select Kri_rk, MAX(value_dt) second_max as reported_dt format=date9. from KRI_OBS_l a group by Kri_rk having count(1)<2;

create table KRI_score_third_max as
select Kri_rk, MAX(value_dt) third_max as reported_dt format=date9.
from KRI_OBS_l
where (kri_rk, value_dt) not in (select kri_rk, first_max from KRI_score_maxl UNION ALL select kri_rk, second_max from KRI_score_second_max)
UNION ALL  
(select Kri_rk, MAX(value_dt) third_max as reported_dt format=date9. from KRI_OBS_l a group by Kri_rk having count(1)<3
MINUS
select Kri_rk,second_max from KRI_score_max); 

quit;

答案 2 :(得分:0)

在使用SAS时,请利用idgroup中的proc summary功能。下面的代码将每组的前3个日期存储在单独的列中(尽管您并未清楚说明数据的存储方式)。

proc summary data=Kri_obs_l nway;
class kri_rk;
output out=want (drop=_:)
    idgroup(max(value_dt) out[3] (value_dt) = reported_dt);
run;

答案 3 :(得分:0)

简单的进程等级应该起作用

proc rank data=Kri_obs_l out=inter descending ties=dense;
   by  Kri_rk;
  var value_dt ;
  ranks value_dt_1; 
 run; 

data final(drop=value_dt_1);
  set inter;
 if value_dt_1 le 3;
 run;

假设每个id的日期中都没有重复项,那么下面的查询也应该有效

  proc sort data =Kri_obs_l out =Kri_obs_l_1;
     by  Kri_rk descending value_dt;
   run;

  data want;
    set Kri_obs_l_1;
  by  Kri_rk descending value_dt;
 if first.Kri_rk then m=1;
 else m+1;
   if m le 3;
 drop m;
   run;


   proc sql;
  create table want(drop=cnt) as 
     select Kri_rk, value_dt, score,
       (select count(value_dt) from Kri_obs_l a
    where a.Kri_rk = b.Kri_rk
   and a.value_dt ge b.value_dt
   group by Kri_rk) as cnt
   from Kri_obs_l b 
   where calculated cnt le 3
   ;

答案 4 :(得分:0)

有很多解决方案。我更喜欢重用您的代码并在proc sql中进行调整。我运行下面的代码,结果成功了。

proc sql;
create table KRI_score_max as
select Kri_rk, MAX(value_dt) as reported_dt format=date9.
from KRI_OBS_l
group by Kri_rk;
quit;

proc sql;
create table KRI_score_second_max as
select a.Kri_rk, MAX(a.value_dt) as reported_dt format=date9.
from KRI_OBS_l a
where a.value_dt not in (Select reported_dt from Kri_score_max b where 
a.kri_rk=b.kri_rk)
group by a.Kri_rk;

quit;

proc sql;
create table KRI_score_third_max as
select a.Kri_rk, MAX(a.value_dt) as reported_dt format=date9.
from KRI_OBS_l a
where a.value_dt not in (Select reported_dt from Kri_score_max b where 
a.kri_rk=b.kri_rk) and 
a.value_dt not in (Select reported_dt from KRI_score_second_max b where 
a.kri_rk=b.kri_rk) 
group by a.Kri_rk;
quit;