我想找到每个小组的第一和第二最早日期。
我曾经在SQL SELECT
语句中这样做,例如在Oracle中使用NTH_VALUE
函数。我不知道SAS proc SQL中有类似的功能。
可以使用SAS RANK
进程,但是我无法获得所需的输出值。
示例数据:
PERSON_ID DT
1 03Oct2019
1 14Oct2019
1 23Oct2019
2 08Oct2019
2 08Oct2019
2 10Oct2019
3 05Oct2019
所需的输出:
PERSON_ID DT1 DT2
1 03Oct2019 14Oct2019
2 08Oct2019 10Oct2019
3 05Oct2019
我从下面类似的东西开始分配等级。但是,我想将前两个等级输出到枢轴显示的一行,而不是将等级本身输出到多行。
proc rank data=have out=want(where=(dt_rank in (1,2))) ties=low ;
by person_id;
var dt;
ranks dt_rank;
run;
PERSON_ID DT DT_RANK
1 03Oct2019 1
1 14Oct2019 2
2 08Oct2019 1
2 10Oct2019 2
3 05Oct2019 1
答案 0 :(得分:0)
您的编程风格的解决方案:
title data we have;
data have ;
infile datalines firstobs=2;
input
@01 PERSON_ID 1.
@11 DT date9.;
format DT date9.;
datalines;
----+----1----+----2----+----3
1 03Oct2019
1 14Oct2019
1 23Oct2019
2 08Oct2019
2 08Oct2019
2 10Oct2019
3 05Oct2019
;
删除重复项
proc sort nodup
data=have
out=sorted;
by person_id dt;
proc print;
run;
并排设置第一个和第二个(使用keep语句删除第三个,然后删除)
title transpose;
proc transpose prefix=DT
data=sorted
out=want (keep=PERSON_ID DT1 DT2);
by PERSON_ID;
var dt;
proc datasets nolist;
modify want;
format dt: date9.;
proc print;
run;
验证正确性
title benchmark;
Data Bench ;
infile datalines truncover firstobs=2;
input
@01 PERSON_ID 1.
@11 DT1 date9.
@21 DT2 date9.;
format DT: date9.;
datalines;
----+----1----+----2----+----3
1 03Oct2019 14Oct2019
2 08Oct2019 10Oct2019
3 05Oct2019
;
proc print;
run;
title compare;
proc compare
base=bench
comp=want;
run;
答案 1 :(得分:0)
title data we have;
data have ;
infile datalines firstobs=2;
input
@01 PERSON_ID 1.
@11 DT date9.;
format DT date9.;
datalines;
----+----1----+----2----+----3
1 03Oct2019
1 14Oct2019
1 23Oct2019
2 08Oct2019
2 08Oct2019
2 10Oct2019
3 05Oct2019
;
proc print;
run;
title want;
proc sql;
create table want as
select H1.person_id
, min(H1.DT) as dt1 format=date9.
, min(H2.DT) as dt2 format=date9.
from have as H1 left join have as H2
on H1.person_id eq H2.person_id and H1.DT lt H2.DT
group by H1.person_id;
proc print;
run;
答案 2 :(得分:0)
使用oracle rank函数
select person_id
, min (case rank when 1 then dt end) as dt_1
, min (case rank when 2 then dt end) as dt_2
from ( select person_id
, dt
, rank() over (partition by PERSON_ID order by DT) as rank
from person_table
)
group by person_id
order by person_id
您可以将其与数据透视功能结合使用,以将行转换为日期的列。
答案 3 :(得分:0)
数据的分组处理,其中SET
和BY
语句嵌套在循环内。
dt1
和dt2
被隐式重置data have; input
id date: date9.; format date date9.; datalines;
1 03Oct2019
1 14Oct2019
1 23Oct2019
2 08Oct2019
2 08Oct2019
2 10Oct2019
3 05Oct2019
run;
data want (keep=id dt1 dt2);
do until (last.id);
set have;
by id;
attrib dt1 dt2 length=4 format=date9.;
if missing(dt1) then dt1 = date;
if missing(dt2) and date ne dt1 then dt2=date;
end;
run;